Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
433 lines
19 KiB
Markdown
433 lines
19 KiB
Markdown
# Code Review — ManagementService
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.ManagementService` |
|
|
| Design doc | `docs/requirements/Component-ManagementService.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-16 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `9c60592` |
|
|
| Open findings | 13 |
|
|
|
|
## Summary
|
|
|
|
The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
|
|
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
|
|
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
|
|
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
|
|
broadly correct and well tested. However, the review surfaced a significant **security
|
|
theme**: site-scope enforcement, which the design document requires for instance- and
|
|
site-targeted Deployment operations, is applied inconsistently — several query handlers and
|
|
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
|
|
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
|
|
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
|
|
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
|
|
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
|
|
instances never disposed) and dead/unused configuration. None of the findings are
|
|
crash-class, but the site-scope gaps are High severity because they are a real
|
|
authorization bypass with no workaround.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
|
|
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
|
|
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
|
|
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
|
|
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
|
|
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
|
|
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
|
|
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
|
|
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
|
|
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |
|
|
|
|
## Findings
|
|
|
|
### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |
|
|
|
|
**Description**
|
|
|
|
The design document (`Component-ManagementService.md`, Authorization section) states that for
|
|
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
|
|
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
|
|
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
|
|
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
|
|
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
|
|
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
|
|
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
|
|
This is an authorization bypass with no workaround.
|
|
|
|
**Recommendation**
|
|
|
|
In each of these handlers resolve the target site and call site-scope enforcement before
|
|
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
|
|
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
|
|
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
|
|
`AuthenticatedUser` into these handlers, currently dropped).
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-002 — Single-entity query handlers leak data across site scope
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |
|
|
|
|
**Description**
|
|
|
|
`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
|
|
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
|
|
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
|
|
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
|
|
or data connection by ID even though that site is excluded from their scope. The list
|
|
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
|
|
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
|
|
do enforce scope, confirming the omission elsewhere is unintentional.)
|
|
|
|
**Recommendation**
|
|
|
|
Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
|
|
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
|
|
(for data connections, scope by the connection's `SiteId`).
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` |
|
|
|
|
**Description**
|
|
|
|
`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
|
|
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
|
|
a stream without checking that the authenticated user is scoped to that instance's site. A
|
|
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
|
|
values, alarm states) of an instance belonging to a site outside their scope. This is the
|
|
streaming equivalent of finding 001/002.
|
|
|
|
**Recommendation**
|
|
|
|
Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
|
|
authenticated user's permitted-site set does not include it. The authenticated identity
|
|
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
|
|
`Context.Items`) so it is available to `SubscribeInstance`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` |
|
|
|
|
**Description**
|
|
|
|
`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
|
|
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
|
|
conventions warn against — the canonical approach is to start the async work and `PipeTo`
|
|
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
|
|
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
|
|
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
|
|
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
|
|
bound; (c) is shielded from synchronous faults only because every path is inside the
|
|
try/catch — any future code path that throws synchronously before the `Task.Run` body would
|
|
escape it.
|
|
|
|
**Recommendation**
|
|
|
|
Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
|
|
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
|
|
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
|
|
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-005 — ManagementActor declares no supervision strategy
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` |
|
|
|
|
**Description**
|
|
|
|
The project conventions call for explicit supervision strategies (Resume for coordinator
|
|
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
|
|
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
|
|
spawns no children so the default strategy is rarely exercised, but an explicit strategy
|
|
should still be declared for clarity and to match the documented convention; it also matters
|
|
if children are added later (e.g. if finding 004 introduces worker actors).
|
|
|
|
**Recommendation**
|
|
|
|
Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
|
|
Resume-based strategy, consistent with other central coordinator actors.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Performance & resource management |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` |
|
|
|
|
**Description**
|
|
|
|
`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
|
|
parses the request body into `doc` at line 83 and never disposes it, and line 112
|
|
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
|
|
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
|
|
churn under load.
|
|
|
|
**Recommendation**
|
|
|
|
Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
|
|
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
|
|
object, or restructure so the fallback path does not parse a throwaway document.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Code organization & conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` |
|
|
|
|
**Description**
|
|
|
|
The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
|
|
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
|
|
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
|
|
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
|
|
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
|
|
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
|
|
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
|
|
|
|
**Recommendation**
|
|
|
|
Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
|
|
explicit DTOs / projections rather than EF entities, or configure
|
|
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
|
|
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
|
|
correctly.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` |
|
|
|
|
**Description**
|
|
|
|
Every other handler resolves its collaborators from the scoped `IServiceProvider`.
|
|
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
|
|
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
|
|
instance silently diverges from the DI-registered one. It is also inconsistent with
|
|
`ManagementEndpoints`, which resolves `RoleMapper` from DI.
|
|
|
|
**Recommendation**
|
|
|
|
Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-009 — Audit logging applied inconsistently across mutating handlers
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |
|
|
|
|
**Description**
|
|
|
|
The design doc states "All mutating operations are audit logged." Some handlers call
|
|
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
|
|
external-system/notification/security/area mutations), but the handlers that delegate to a
|
|
domain service do **not** — `HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
|
|
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
|
|
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
|
|
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers. This is
|
|
correct only if every one of those services performs its own audit logging internally; the
|
|
mixed pattern makes that impossible to verify by reading this module and creates a real risk
|
|
of silent audit gaps for template authoring and deployment operations.
|
|
|
|
**Recommendation**
|
|
|
|
Decide on one layer that owns auditing. Either route all mutations through services that audit
|
|
internally (and remove the explicit `AuditAsync` calls here), or audit uniformly in the actor
|
|
after every successful mutation. Document the chosen contract so the inconsistency cannot
|
|
recur, and confirm template/deployment services actually audit.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` |
|
|
|
|
**Description**
|
|
|
|
`ManagementServiceOptions.CommandTimeout` is bound from configuration in
|
|
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
|
|
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
|
|
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
|
|
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
|
|
in `appsettings.json` gets no effect.
|
|
|
|
**Recommendation**
|
|
|
|
Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
|
|
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
|
|
configuration cannot be set with no effect.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` |
|
|
|
|
**Description**
|
|
|
|
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
|
|
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
|
|
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
|
|
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
|
|
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
|
|
enumerate role mappings for arbitrary LDAP groups with no role requirement
|
|
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
|
|
the design says no longer exists.
|
|
|
|
**Recommendation**
|
|
|
|
If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
|
|
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
|
|
data unauthenticated is intended.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` |
|
|
|
|
**Description**
|
|
|
|
`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
|
|
`switch` with a `NotSupportedException` default for unknown types. While the individual
|
|
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
|
|
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
|
|
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
|
|
message contract is also harder to evolve safely under the additive-only rule.
|
|
|
|
**Recommendation**
|
|
|
|
Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
|
|
and type the envelope payload as that interface. This documents the contract, lets analyzers
|
|
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Status | Open |
|
|
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
|
|
|
|
**Description**
|
|
|
|
`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
|
|
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
|
|
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
|
|
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
|
|
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
|
|
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
|
|
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
|
|
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
|
|
sites, so no test ever exercises a site-scoped user.
|
|
|
|
**Recommendation**
|
|
|
|
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
|
|
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
|
|
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
|
|
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|