docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
This commit is contained in:
432
code-reviews/ManagementService/findings.md
Normal file
432
code-reviews/ManagementService/findings.md
Normal file
@@ -0,0 +1,432 @@
|
||||
# Code Review — ManagementService
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Module | `src/ScadaLink.ManagementService` |
|
||||
| Design doc | `docs/requirements/Component-ManagementService.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 13 |
|
||||
|
||||
## Summary
|
||||
|
||||
The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
|
||||
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
|
||||
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
|
||||
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
|
||||
broadly correct and well tested. However, the review surfaced a significant **security
|
||||
theme**: site-scope enforcement, which the design document requires for instance- and
|
||||
site-targeted Deployment operations, is applied inconsistently — several query handlers and
|
||||
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
|
||||
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
|
||||
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
|
||||
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
|
||||
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
|
||||
instances never disposed) and dead/unused configuration. None of the findings are
|
||||
crash-class, but the site-scope gaps are High severity because they are a real
|
||||
authorization bypass with no workaround.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
|
||||
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
|
||||
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
|
||||
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
|
||||
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
|
||||
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
|
||||
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
|
||||
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
|
||||
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
|
||||
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |
|
||||
|
||||
## Findings
|
||||
|
||||
### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design document (`Component-ManagementService.md`, Authorization section) states that for
|
||||
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
|
||||
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
|
||||
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
|
||||
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
|
||||
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
|
||||
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
|
||||
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
|
||||
This is an authorization bypass with no workaround.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In each of these handlers resolve the target site and call site-scope enforcement before
|
||||
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
|
||||
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
|
||||
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
|
||||
`AuthenticatedUser` into these handlers, currently dropped).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-002 — Single-entity query handlers leak data across site scope
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
|
||||
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
|
||||
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
|
||||
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
|
||||
or data connection by ID even though that site is excluded from their scope. The list
|
||||
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
|
||||
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
|
||||
do enforce scope, confirming the omission elsewhere is unintentional.)
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
|
||||
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
|
||||
(for data connections, scope by the connection's `SiteId`).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` |
|
||||
|
||||
**Description**
|
||||
|
||||
`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
|
||||
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
|
||||
a stream without checking that the authenticated user is scoped to that instance's site. A
|
||||
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
|
||||
values, alarm states) of an instance belonging to a site outside their scope. This is the
|
||||
streaming equivalent of finding 001/002.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
|
||||
authenticated user's permitted-site set does not include it. The authenticated identity
|
||||
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
|
||||
`Context.Items`) so it is available to `SubscribeInstance`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
|
||||
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
|
||||
conventions warn against — the canonical approach is to start the async work and `PipeTo`
|
||||
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
|
||||
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
|
||||
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
|
||||
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
|
||||
bound; (c) is shielded from synchronous faults only because every path is inside the
|
||||
try/catch — any future code path that throws synchronously before the `Task.Run` body would
|
||||
escape it.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
|
||||
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
|
||||
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
|
||||
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-005 — ManagementActor declares no supervision strategy
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` |
|
||||
|
||||
**Description**
|
||||
|
||||
The project conventions call for explicit supervision strategies (Resume for coordinator
|
||||
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
|
||||
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
|
||||
spawns no children so the default strategy is rarely exercised, but an explicit strategy
|
||||
should still be declared for clarity and to match the documented convention; it also matters
|
||||
if children are added later (e.g. if finding 004 introduces worker actors).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
|
||||
Resume-based strategy, consistent with other central coordinator actors.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` |
|
||||
|
||||
**Description**
|
||||
|
||||
`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
|
||||
parses the request body into `doc` at line 83 and never disposes it, and line 112
|
||||
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
|
||||
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
|
||||
churn under load.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
|
||||
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
|
||||
object, or restructure so the fallback path does not parse a throwaway document.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` |
|
||||
|
||||
**Description**
|
||||
|
||||
The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
|
||||
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
|
||||
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
|
||||
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
|
||||
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
|
||||
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
|
||||
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
|
||||
explicit DTOs / projections rather than EF entities, or configure
|
||||
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
|
||||
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
|
||||
correctly.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` |
|
||||
|
||||
**Description**
|
||||
|
||||
Every other handler resolves its collaborators from the scoped `IServiceProvider`.
|
||||
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
|
||||
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
|
||||
instance silently diverges from the DI-registered one. It is also inconsistent with
|
||||
`ManagementEndpoints`, which resolves `RoleMapper` from DI.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-009 — Audit logging applied inconsistently across mutating handlers
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design doc states "All mutating operations are audit logged." Some handlers call
|
||||
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
|
||||
external-system/notification/security/area mutations), but the handlers that delegate to a
|
||||
domain service do **not** — `HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
|
||||
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
|
||||
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
|
||||
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers. This is
|
||||
correct only if every one of those services performs its own audit logging internally; the
|
||||
mixed pattern makes that impossible to verify by reading this module and creates a real risk
|
||||
of silent audit gaps for template authoring and deployment operations.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Decide on one layer that owns auditing. Either route all mutations through services that audit
|
||||
internally (and remove the explicit `AuditAsync` calls here), or audit uniformly in the actor
|
||||
after every successful mutation. Document the chosen contract so the inconsistency cannot
|
||||
recur, and confirm template/deployment services actually audit.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ManagementServiceOptions.CommandTimeout` is bound from configuration in
|
||||
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
|
||||
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
|
||||
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
|
||||
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
|
||||
in `appsettings.json` gets no effect.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
|
||||
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
|
||||
configuration cannot be set with no effect.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
|
||||
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
|
||||
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
|
||||
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
|
||||
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
|
||||
enumerate role mappings for arbitrary LDAP groups with no role requirement
|
||||
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
|
||||
the design says no longer exists.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
|
||||
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
|
||||
data unauthenticated is intended.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
|
||||
`switch` with a `NotSupportedException` default for unknown types. While the individual
|
||||
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
|
||||
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
|
||||
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
|
||||
message contract is also harder to evolve safely under the additive-only rule.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
|
||||
and type the envelope payload as that interface. This documents the contract, lets analyzers
|
||||
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
|
||||
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
|
||||
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
|
||||
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
|
||||
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
|
||||
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
|
||||
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
|
||||
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
|
||||
sites, so no test ever exercises a site-scoped user.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
|
||||
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
|
||||
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
|
||||
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Reference in New Issue
Block a user