# Code Review — ManagementService | Field | Value | |-------|-------| | Module | `src/ScadaLink.ManagementService` | | Design doc | `docs/requirements/Component-ManagementService.md` | | Status | Reviewed | | Last reviewed | 2026-05-16 | | Reviewer | claude-agent | | Commit reviewed | `9c60592` | | Open findings | 13 | ## Summary The ManagementService module is a thin command-dispatch layer: a single `ManagementActor` fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code is consistently structured and the role-based authorization gate (`GetRequiredRole`) is broadly correct and well tested. However, the review surfaced a significant **security theme**: site-scope enforcement, which the design document requires for instance- and site-targeted Deployment operations, is applied inconsistently — several query handlers and all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`, declares no supervision strategy, and the contract messages carry a loosely-typed `object` payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument` instances never disposed) and dead/unused configuration. None of the findings are crash-class, but the site-scope gaps are High severity because they are a real authorization bypass with no workaround. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. | | 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. | | 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. | | 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. | | 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. | | 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. | | 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. | | 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. | | 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. | | 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). | ## Findings ### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement | | | |--|--| | Severity | High | | Category | Security | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` | **Description** The design document (`Component-ManagementService.md`, Authorization section) states that for Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists "debug snapshot, parked message queries, site event log queries" among the Deployment-role operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`, `HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope` or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event logs / parked messages of site B, retry or discard another site's parked messages, and pull a debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`. This is an authorization bypass with no workaround. **Recommendation** In each of these handlers resolve the target site and call site-scope enforcement before delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the `Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading `AuthenticatedUser` into these handlers, currently dropped). **Resolution** _Unresolved._ ### ManagementService-002 — Single-entity query handlers leak data across site scope | | | |--|--| | Severity | High | | Category | Security | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` | **Description** `HandleListInstances` and `HandleListSites` correctly filter their results by the user's `PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`, `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no site-scope check, so a site-scoped Deployment user can read any instance, site, area tree, or data connection by ID even though that site is excluded from their scope. The list endpoints having a filter while the get-by-id endpoints do not is an inconsistency that undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides` do enforce scope, confirming the omission elsewhere is unintentional.) **Recommendation** Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` (for data connections, scope by the connection's `SiteId`). **Resolution** _Unresolved._ ### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization | | | |--|--| | Severity | High | | Category | Security | | Status | Open | | Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` | **Description** `OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the `Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts a stream without checking that the authenticated user is scoped to that instance's site. A site-scoped Deployment user can therefore subscribe to the live debug stream (attribute values, alarm states) of an instance belonging to a site outside their scope. This is the streaming equivalent of finding 001/002. **Recommendation** Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the authenticated user's permitted-site set does not include it. The authenticated identity established in `OnConnectedAsync` must be persisted on the connection (e.g. in `Context.Items`) so it is available to `SubscribeInstance`. **Resolution** _Unresolved._ ### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo | | | |--|--| | Severity | Medium | | Category | Akka.NET conventions | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` | **Description** `HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)` and replies from inside the continuation. This is the anti-pattern the project's Akka.NET conventions warn against — the canonical approach is to start the async work and `PipeTo` its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before the closure, the current code: (a) lets multiple commands execute fully concurrently with no actor-thread serialization, so the actor provides no ordering or back-pressure guarantees and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox bound; (c) is shielded from synchronous faults only because every path is inside the try/catch — any future code path that throws synchronously before the `Task.Run` body would escape it. **Recommendation** Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result (`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender, mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make that explicit with a router/dispatcher rather than ad-hoc `Task.Run`. **Resolution** _Unresolved._ ### ManagementService-005 — ManagementActor declares no supervision strategy | | | |--|--| | Severity | Low | | Category | Akka.NET conventions | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` | **Description** The project conventions call for explicit supervision strategies (Resume for coordinator actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no `SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it spawns no children so the default strategy is rarely exercised, but an explicit strategy should still be declared for clarity and to match the documented convention; it also matters if children are added later (e.g. if finding 004 introduces worker actors). **Recommendation** Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a Resume-based strategy, consistent with other central coordinator actors. **Resolution** _Unresolved._ ### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint | | | |--|--| | Severity | Medium | | Category | Performance & resource management | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` | **Description** `JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest` parses the request body into `doc` at line 83 and never disposes it, and line 112 (`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed. Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool churn under load. **Recommendation** Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty object, or restructure so the fallback path does not parse a throwaway document. **Resolution** _Unresolved._ ### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities | | | |--|--| | Severity | Medium | | Category | Code organization & conventions | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` | **Description** The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`) while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a `Recipients` collection, `Template` with children). With default Newtonsoft settings any bidirectional navigation property produces a `JsonSerializationException` for self-referencing loops, and even without cycles this serializes lazy/navigation state the CLI does not expect. **Recommendation** Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize explicit DTOs / projections rather than EF entities, or configure `ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip correctly. **Resolution** _Unresolved._ ### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI | | | |--|--| | Severity | Low | | Category | Correctness & logic bugs | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` | **Description** Every other handler resolves its collaborators from the scoped `IServiceProvider`. `HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService())`, bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built instance silently diverges from the DI-registered one. It is also inconsistent with `ManagementEndpoints`, which resolves `RoleMapper` from DI. **Recommendation** Resolve `RoleMapper` via `sp.GetRequiredService()` like every other dependency. **Resolution** _Unresolved._ ### ManagementService-009 — Audit logging applied inconsistently across mutating handlers | | | |--|--| | Severity | Medium | | Category | Error handling & resilience | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` | **Description** The design doc states "All mutating operations are audit logged." Some handlers call `AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct external-system/notification/security/area mutations), but the handlers that delegate to a domain service do **not** — `HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`, all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`, `HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers. This is correct only if every one of those services performs its own audit logging internally; the mixed pattern makes that impossible to verify by reading this module and creates a real risk of silent audit gaps for template authoring and deployment operations. **Recommendation** Decide on one layer that owns auditing. Either route all mutations through services that audit internally (and remove the explicit `AuditAsync` calls here), or audit uniformly in the actor after every successful mutation. Document the chosen contract so the inconsistency cannot recur, and confirm template/deployment services actually audit. **Resolution** _Unresolved._ ### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used | | | |--|--| | Severity | Low | | Category | Design-document adherence | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` | **Description** `ManagementServiceOptions.CommandTimeout` is bound from configuration in `ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes `AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as "Reserved for future configuration — e.g., command timeout overrides", yet a concrete `CommandTimeout` property already exists and is silently ignored, so an operator who sets it in `appsettings.json` gets no effect. **Recommendation** Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest` (inject `IOptions`), or remove the property until it is wired up so configuration cannot be set with no effect. **Resolution** _Unresolved._ ### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code | | | |--|--| | Severity | Low | | Category | Correctness & logic bugs | | Status | Open | | Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` | **Description** The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow (ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints` performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand` case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could enumerate role mappings for arbitrary LDAP groups with no role requirement (`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path the design says no longer exists. **Recommendation** If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping data unauthenticated is intended. **Resolution** _Unresolved._ ### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload | | | |--|--| | Severity | Low | | Category | Akka.NET conventions | | Status | Open | | Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` | **Description** `ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended `switch` with a `NotSupportedException` default for unknown types. While the individual command records are immutable, `object` defeats compile-time exhaustiveness — adding a new command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`) need updating, and a typo or unregistered command surfaces only as a runtime exception. The message contract is also harder to evolve safely under the additive-only rule. **Recommendation** Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record and type the envelope payload as that interface. This documents the contract, lets analyzers flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise. **Resolution** _Unresolved._ ### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub | | | |--|--| | Severity | Medium | | Category | Testing coverage | | Status | Open | | Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` | **Description** `ManagementActorTests` covers role-based authorization, success/error mapping, and correlation IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement — `EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized` mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b) `ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution, command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub` authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are untested. The `Envelope` test helper always passes `Array.Empty()` for permitted sites, so no test ever exercises a site-scoped user. **Recommendation** Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope targets for instance and site operations, asserting `ManagementUnauthorized` on violations. Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures, malformed bodies, unknown commands, and the 200/400/403/401/504 mappings. **Resolution** _Unresolved._