Files
ScadaBridge/code-reviews/ManagementService/findings.md
T
Joseph Doherty f93b7b99bb code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.

regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
2026-05-28 02:55:47 -04:00

1083 lines
60 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — ManagementService
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.ManagementService` |
| Design doc | `docs/requirements/Component-ManagementService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 6 (1 Deferred — see ManagementService-012) |
## Summary
The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
broadly correct and well tested. However, the review surfaced a significant **security
theme**: site-scope enforcement, which the design document requires for instance- and
site-targeted Deployment operations, is applied inconsistently — several query handlers and
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
instances never disposed) and dead/unused configuration. None of the findings are
crash-class, but the site-scope gaps are High severity because they are a real
authorization bypass with no workaround.
#### Re-review 2026-05-17 (commit `39d737e`)
All thirteen prior findings remain correctly closed; the source under
`src/ScadaLink.ManagementService` is byte-identical between the previously reviewed state
and `39d737e` (the resolution commits of findings 001013 are folded into the history at or
before `39d737e`). ManagementService-012 was re-checked and its **Deferred** status still
holds: `ManagementEnvelope.Command` is still typed `object`, and the marker-interface fix
still belongs in the Commons module, outside this module's edit scope — nothing has changed
to make it actionable here. This re-review re-ran the full 10-category checklist against the
current sources and surfaced **four new findings**. The dominant theme is the same
site-scope authorization gap that findings 001/002 closed: `HandleQueryDeployments`
(`QueryDeploymentsCommand`) was overlooked by that sweep and still performs no site-scope
enforcement, letting a site-scoped Deployment user read deployment history for any site
(014, High). The remaining three are lower severity: a non-atomic multi-override mutation
that can leave an instance partially modified after an error (015, Medium), raw exception
messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and
`QueryDeploymentsCommand` having no test coverage at all (017, Low).
#### Re-review 2026-05-28 (commit `1eb6e97`)
All seventeen prior findings remain correctly closed; ManagementService-012 is still the
only Deferred entry (marker-interface on `ManagementEnvelope.Command` still belongs in the
Commons module). The module has grown substantially since the last review (`+1997 lines`):
the Transport (#24) bundle commands (`ExportBundle`/`PreviewBundle`/`ImportBundle`) have
been added to `ManagementActor`, and a new `AuditEndpoints.cs` (`/api/audit/query` and
`/api/audit/export`) ships alongside the existing `/management` endpoint. This re-review
re-ran the full 10-category checklist and surfaced **six new findings**. The dominant
theme is the same authorization gap that findings 001/002/003/014 closed for the
ManagementActor, now resurfacing in the new surfaces:
**QueryAuditLogCommand has no role gate at all** (018, High) — any authenticated user can
read the configuration audit log via `/management`, even though the parallel
`/api/audit/query` requires `OperationalAuditRoles`. The new `/api/audit/{query,export}`
endpoints build an `AuthenticatedUser` with `PermittedSiteIds` but never enforce site scope
(019, Medium) — although audit roles are not site-scoped by design, the user-supplied
`sourceSiteId` filter is honoured verbatim. `HandleUpdateSmtpConfig` returns the full
SmtpConfiguration entity (including the `Credentials` field, which can carry SMTP passwords
/ OAuth2 client secrets) in the response and audit row (020, Medium). The Transport (#24)
bundle commands have zero test coverage in `ManagementActorTests` (021, Medium) — neither
role gating nor success/error paths. The `Component-ManagementService.md` design doc is
stale on three fronts: it does not mention Transport bundle commands, the `/api/audit/*`
endpoints, or the now-wired `CommandTimeout` option (022, Low). Finally,
`HandleQueryDeployments` issues one `GetInstanceByIdAsync` per unique instance ID when
filtering for a site-scoped user — an N+1 read pattern on the unfiltered branch (023, Low).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |
_Re-review (2026-05-28, `1eb6e97`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | + | `HandleImportBundle` correctly dedupes resolutions per (entity,name); `ParseDocument` still allocates a `JsonDocument.Parse("{}")` on the failure path but the caller's `using` disposes it. No new defects. |
| 2 | Akka.NET conventions | + | PipeTo dispatch from 004 is intact; supervision strategy from 005 is intact; `Sender` correctly captured to local before PipeTo. No new findings. |
| 3 | Concurrency & thread safety | + | Bundle handlers `await` cleanly; `BundleSession` is not cleaned up if `PreviewAsync`/`ApplyAsync` throws, but that is an `IBundleImporter` contract concern outside this module. No new findings. |
| 4 | Error handling & resilience | + | `ManagementCommandException` from 016 is applied consistently across the new bundle handlers (curated `CryptographicException`/`ArgumentException` paths). No new findings. |
| 5 | Security | + | `QueryAuditLogCommand` has no role gate (018, High). New `/api/audit/*` endpoints build `PermittedSiteIds` but never enforce them (019, Medium). `HandleUpdateSmtpConfig` returns + audits `Credentials` verbatim (020, Medium). |
| 6 | Performance & resource management | + | `HandleQueryDeployments` unfiltered-with-scope branch is N+1 on instance lookups (023, Low). Request body up to 200 MB read into a single `string` in `HandleRequest` (acceptable per Transport bundle requirement). |
| 7 | Design-document adherence | + | `Component-ManagementService.md` is stale on Transport bundle commands, `/api/audit/*` endpoints, and the now-wired `CommandTimeout` (022, Low). |
| 8 | Code organization & conventions | + | `AuditEndpoints` duplicates the Basic Auth → LDAP → roles flow from `ManagementEndpoints` (~50 lines). Acknowledged in `AuditEndpoints` XML but worth tracking. No new finding raised. |
| 9 | Testing coverage | + | Transport bundle commands have zero `ManagementActorTests` coverage — neither role gating nor handler logic (021, Medium). |
| 10 | Documentation & comments | + | New `AuditEndpoints` XML doc is high quality. `Component-ManagementService.md` not updated for Transport/Audit endpoints (022 covers). |
## Findings
### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |
**Description**
The design document (`Component-ManagementService.md`, Authorization section) states that for
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
This is an authorization bypass with no workaround.
**Recommendation**
In each of these handlers resolve the target site and call site-scope enforcement before
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
`AuthenticatedUser` into these handlers, currently dropped).
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Threaded `AuthenticatedUser` into
`HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleRetryParkedMessage`,
`HandleDiscardParkedMessage`, and `HandleDebugSnapshot`; added an
`EnforceSiteScopeForIdentifier` helper that resolves the site by identifier and applies
`EnforceSiteScope`. `HandleDebugSnapshot` enforces against the already-loaded instance's
`SiteId`. Regression tests: `QueryEventLogs_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`QueryParkedMessages_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`RetryParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`DiscardParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`DebugSnapshot_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.
### ManagementService-002 — Single-entity query handlers leak data across site scope
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |
**Description**
`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
or data connection by ID even though that site is excluded from their scope. The list
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
do enforce scope, confirming the omission elsewhere is unintentional.)
**Recommendation**
Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
(for data connections, scope by the connection's `SiteId`).
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). `HandleGetInstance`, `HandleGetSite`,
`HandleGetDataConnection` now take `AuthenticatedUser` and call `EnforceSiteScope` against
the resolved entity's site ID (instance `SiteId`, site `Id`, data-connection `SiteId`);
`HandleListAreas` enforces against the requested `SiteId` before querying. Regression tests:
`GetInstance_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetInstance_InScopeForSiteScopedUser_ReturnsSuccess`,
`GetSite_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetSite_OutOfScopeForAdminUser_ReturnsSuccess`,
`ListAreas_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetDataConnection_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.
### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` |
**Description**
`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
a stream without checking that the authenticated user is scoped to that instance's site. A
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
values, alarm states) of an instance belonging to a site outside their scope. This is the
streaming equivalent of finding 001/002.
**Recommendation**
Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
authenticated user's permitted-site set does not include it. The authenticated identity
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
`Context.Items`) so it is available to `SubscribeInstance`.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). `OnConnectedAsync` now persists the resolved
roles and `PermittedSiteIds` in `Context.Items`. `SubscribeInstance` resolves the
instance's site via `ITemplateEngineRepository` and rejects the subscription (sending
`OnStreamTerminated`) when the new pure `DebugStreamHub.IsInstanceAccessAllowed` check
fails. The check grants access for the Admin role or system-wide Deployment (empty
permitted set) and otherwise requires the instance's site in the permitted set. Regression
tests: `IsInstanceAccessAllowed_SiteScopedUser_OutOfScopeInstance_Denied`,
`IsInstanceAccessAllowed_SiteScopedUser_InScopeInstance_Allowed`,
`IsInstanceAccessAllowed_SystemWideDeployment_AnySiteAllowed`,
`IsInstanceAccessAllowed_AdminRole_BypassesSiteScope`,
`IsInstanceAccessAllowed_AdminRoleCheck_IsCaseInsensitive`.
### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
| | |
|--|--|
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` |
**Description**
`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
conventions warn against — the canonical approach is to start the async work and `PipeTo`
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
bound; (c) is shielded from synchronous faults only because every path is inside the
try/catch — any future code path that throws synchronously before the `Task.Run` body would
escape it.
**Recommendation**
Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `HandleEnvelope` ran every command via
`Task.Run` and replied from inside the continuation, contrary to the project's PipeTo
convention. Replaced it with a `ProcessCommand` method returning a `Task<object>` and
`PipeTo(sender, success, failure)`; faults are now mapped uniformly in a `MapFault` failure
continuation (`SiteScopeViolationException` -> `ManagementUnauthorized`, otherwise
`ManagementError`), which also unwraps `AggregateException`. Regression test:
`UnknownCommandType_FaultMappedToManagementError`. Existing success/error/unauthorized
mapping tests confirm behaviour is preserved.
### ManagementService-005 — ManagementActor declares no supervision strategy
| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` |
**Description**
The project conventions call for explicit supervision strategies (Resume for coordinator
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
spawns no children so the default strategy is rarely exercised, but an explicit strategy
should still be declared for clarity and to match the documented convention; it also matters
if children are added later (e.g. if finding 004 introduces worker actors).
**Recommendation**
Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
Resume-based strategy, consistent with other central coordinator actors.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `ManagementActor` declared no
`SupervisorStrategy`. Added a `public static SupervisorStrategy CreateSupervisorStrategy()`
factory returning an unbounded `OneForOneStrategy` with a `Directive.Resume` decider, and a
`protected override SupervisorStrategy()` that delegates to it — matching the Resume-based
convention of `CentralCommunicationActor`/`SiteCommunicationActor`. The actor spawns no
children today, so this is a forward-looking correctness fix. Regression tests:
`CreateSupervisorStrategy_ReturnsOneForOneStrategy`,
`CreateSupervisorStrategy_ResumesOnArbitraryException`,
`CreateSupervisorStrategy_ResumesIndefinitely` (new `ManagementActorSupervisionTests.cs`).
### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` |
**Description**
`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
parses the request body into `doc` at line 83 and never disposes it, and line 112
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
churn under load.
**Recommendation**
Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
object, or restructure so the fallback path does not parse a throwaway document.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: the request `JsonDocument` was never
disposed and the empty-payload path allocated a second throwaway `JsonDocument`. Extracted
request parsing into a testable `ManagementEndpoints.ParseCommand` helper that wraps the
document in `using`; the missing-payload case now deserializes from the `"{}"` literal
string rather than parsing a throwaway document. Regression tests:
`ParseCommand_WithExplicitPayload_DeserializesIntoCommandType`,
`ParseCommand_WithMissingPayload_DeserializesParameterlessCommand`,
`ParseCommand_WithInvalidJson_ReturnsFailure`,
`ParseCommand_WithMissingCommandField_ReturnsFailure`,
`ParseCommand_WithUnknownCommand_ReturnsFailure`.
### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
| | |
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` |
**Description**
The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
**Recommendation**
Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
explicit DTOs / projections rather than EF entities, or configure
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
correctly.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: the actor serialized results with
`Newtonsoft.Json` (not even a direct package reference) while the HTTP endpoint uses
`System.Text.Json`. Standardised the actor on `System.Text.Json` via a new
`ManagementActor.SerializeResult` helper using a shared `JsonSerializerOptions` with
`ReferenceHandler.IgnoreCycles` (cycle-safe for EF entity graphs) and camelCase naming
(matches the CLI's case-insensitive deserializer). Removed the `Newtonsoft.Json` import.
Regression tests: `SerializeResult_WithCyclicGraph_DoesNotThrow`,
`SerializeResult_UsesCamelCasePropertyNames`.
### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` |
**Description**
Every other handler resolves its collaborators from the scoped `IServiceProvider`.
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
instance silently diverges from the DI-registered one. It is also inconsistent with
`ManagementEndpoints`, which resolves `RoleMapper` from DI.
**Recommendation**
Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `HandleResolveRoles` did
`new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`, bypassing the
`AddScoped<RoleMapper>()` DI registration. The hand-built `RoleMapper` lived only inside
`HandleResolveRoles`, which is itself the dead-code dispatch path removed under finding 011
(the two-step ResolveRoles flow is retired). Resolving 011 by deleting the
`ResolveRolesCommand` dispatch case and `HandleResolveRoles` handler also removes the only
manually-constructed `RoleMapper` in the module, so the DI-bypass no longer exists. No
remaining `new RoleMapper` in `src/ScadaLink.ManagementService`. Regression covered by
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.
### ManagementService-009 — Audit logging applied inconsistently across mutating handlers
| | |
|--|--|
| Severity | Low — re-triaged from Medium; the claimed audit gap does not exist (see Description), leaving only an undocumented-convention issue. |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |
**Description**
The design doc states "All mutating operations are audit logged." Some handlers call
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
external-system/notification/security/area mutations), but the handlers that delegate to a
domain service do **not**`HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers.
**Re-triage (2026-05-16):** the original finding claimed this "creates a real risk of silent
audit gaps for template authoring and deployment operations." That claim was verified against
the actual sources and is **false**. Every domain service the delegating handlers call —
`TemplateService`, `SharedScriptService`, `InstanceService`, `AreaService`, `SiteService`,
`TemplateFolderService`, `DeploymentService`, `ArtifactDeploymentService` — injects
`IAuditService` and calls `LogAsync` on every mutation (`grep` confirms an `_auditService.LogAsync`
call after each `Create`/`Update`/`Delete` in `TemplateService.cs`, `DeploymentService.cs`,
`ArtifactDeploymentService.cs`, etc.). There is therefore no audit gap; if anything, adding
explicit `AuditAsync` to a delegating handler would *double-log*. The genuine issue is purely
organizational: the two-layer split (actor audits repo-direct mutations, services audit their
own) was undocumented, which is what made it look risky. This is a Low-severity
code-organization issue, not a Medium error-handling/resilience defect.
**Recommendation**
Document the chosen contract so the split cannot be misread as a gap. (The original
alternative — moving all auditing into the actor — would require un-auditing eight services
and is not warranted given they already audit correctly.)
**Resolution**
Resolved 2026-05-16 (commit pending). Re-triaged to Low / Code organization after verifying
all eight delegated-to services audit internally — no audit gap exists. Documented the
two-layer audit contract in an XML `<remarks>` block on `ManagementActor.AuditAsync`:
repository-direct mutations call `AuditAsync`; service-delegating handlers must not, because
the services own auditing and a duplicate call would double-log. No behavioural change, so
no new regression test; existing `CreateInstanceCommand_WithDeploymentRole_ReturnsSuccess`
covers the explicit-audit path.
### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` |
**Description**
`ManagementServiceOptions.CommandTimeout` is bound from configuration in
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
in `appsettings.json` gets no effect.
**Recommendation**
Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
configuration cannot be set with no effect.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `ManagementEndpoints` hard-coded
`AskTimeout = TimeSpan.FromSeconds(30)` and never read `ManagementServiceOptions.CommandTimeout`.
`HandleRequest` now resolves `IOptions<ManagementServiceOptions>` from `context.RequestServices`
and computes the Ask timeout via a new `ManagementEndpoints.ResolveAskTimeout` helper, which
returns the configured `CommandTimeout` when strictly positive and otherwise falls back to the
30s default (guarding against a misconfigured zero/negative value that would fail every call).
Regression tests: `ResolveAskTimeout_UsesConfiguredCommandTimeout`,
`ResolveAskTimeout_WithNullOptions_FallsBackToDefault`,
`ResolveAskTimeout_WithNonPositiveTimeout_FallsBackToDefault`.
### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` |
**Description**
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
enumerate role mappings for arbitrary LDAP groups with no role requirement
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
the design says no longer exists.
**Recommendation**
If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
data unauthenticated is intended.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed dead path: a repository-wide search found no
`ResolveRolesCommand` sender outside `ManagementActor` itself — the CLI and HTTP endpoint
perform LDAP auth + role resolution inline. Removed the `ResolveRolesCommand` dispatch case
and the `HandleResolveRoles` handler from `ManagementActor`; a stray ClusterClient sender now
falls through to the `NotSupportedException` default and gets a uniform `ManagementError`
(closing the unauthenticated role-mapping enumeration surface, since `GetRequiredRole`
returned null for it). A code comment at the former dispatch site documents the intentional
omission. Note: the `ResolveRolesCommand` *record* itself lives in
`src/ScadaLink.Commons/Messages/Management/SecurityCommands.cs` and was left in place — that
file is outside this module's permitted edit scope; deleting the orphan record should be done
as a Commons-module follow-up. With the handler removed it is now an inert,
registry-only type with no behaviour. Regression test:
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.
### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Deferred |
| Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` |
**Description**
`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
`switch` with a `NotSupportedException` default for unknown types. While the individual
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
message contract is also harder to evolve safely under the additive-only rule.
**Recommendation**
Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
and type the envelope payload as that interface. This documents the contract, lets analyzers
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.
**Resolution**
Deferred 2026-05-16. Finding verified as genuine: `ManagementEnvelope.Command` is typed
`object` and the recommended `IManagementCommand` marker-interface fix is sound. However, the
fix cannot be implemented within the `ManagementService` module: both `ManagementEnvelope` and
all ~50 `*Command` records live in `src/ScadaLink.Commons/Messages/Management/` (17 files),
which is outside this work item's permitted edit scope (`src/ScadaLink.ManagementService/**`,
its tests, and this findings file only). Adding the marker interface, retyping the envelope,
and having `ManagementCommandRegistry` constrain its reflection scan to `IManagementCommand`
implementers is a cohesive Commons-module change and must be done there — also so the Commons
message-contract additive-only evolution rules are respected. Deferred to a Commons-module
work item; no `ManagementService`-local change is appropriate.
### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
**Description**
`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
sites, so no test ever exercises a site-scoped user.
**Recommendation**
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
**Resolution**
Resolved 2026-05-16 (commit pending). The site-scope and `DebugStreamHub` coverage gaps
were closed by the resolution of findings 001/002/003 (the `ScopedEnvelope` helper plus the
`*_OutOfScopeForSiteScopedUser_ReturnsUnauthorized` tests and `DebugStreamHubTests`). The
remaining HTTP-endpoint gap is now covered by a new `ManagementEndpointsTests.cs` exercising
`ManagementEndpoints.ParseCommand` — command deserialization, malformed JSON, missing
`command` field, and unknown commands. Full `WebApplicationFactory` auth-flow tests were
deliberately not added: `ManagementEndpoints` depends on `LdapAuthService` and live LDAP
infrastructure, so the testable command-parsing/dispatch logic was extracted into the pure
`ParseCommand` helper and covered instead. Tests: `ParseCommand_*` (5),
`SerializeResult_*` (2), `UnknownCommandType_FaultMappedToManagementError`, plus the
pre-existing site-scope and DebugStreamHub suites. `dotnet test` -> 48 passed.
### ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:306`, `:1174` |
**Description**
`QueryDeploymentsCommand` is gated to the `Deployment` role by `GetRequiredRole`
(`ManagementActor.cs:170``:177`), and the design document's Authorization section states
"Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments
among the Deployment-role operations. `HandleQueryDeployments` makes no call to
`EnforceSiteScope` / `EnforceSiteScopeForInstance` / `EnforceSiteScopeForIdentifier`: with no
`InstanceId` it returns `IDeploymentManagerRepository.GetAllDeploymentRecordsAsync()` (every
deployment record across all sites), and with an `InstanceId` it returns that instance's
deployment history with no check that the instance's site is within the caller's permitted
set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment
records for instances at site B — instance IDs, `DeployedBy` (operator usernames),
timestamps, deployment status, and `ErrorMessage` content — by issuing `QueryDeployments`
with or without an out-of-scope `InstanceId`. This is the same authorization-bypass class as
the resolved findings 001/002, on a handler that sweep did not cover; it is `DispatchCommand`'s
only `Deployment`-role handler with no scope enforcement.
**Recommendation**
Thread `AuthenticatedUser` into `HandleQueryDeployments` (the dispatch case at line 306
already has `user` in scope). When `cmd.InstanceId` is supplied, call
`EnforceSiteScopeForInstance` before querying. When it is not supplied, filter the returned
`DeploymentRecord` list to the caller's permitted sites — resolve each record's instance to
its `SiteId` (or join through a site-aware repository query) and drop records for sites
outside `PermittedSiteIds`, mirroring the `HandleListInstances` / `HandleListSites` filter
pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope
instances.
**Resolution**
Resolved 2026-05-17 (commit pending). **Re-triage:** the finding understated the gap — it
claimed `QueryDeploymentsCommand` was already "gated to the `Deployment` role by
`GetRequiredRole` (lines 170177)". Verified against the source: `QueryDeploymentsCommand`
appeared nowhere in `GetRequiredRole`, so it required *no* role at all — any authenticated
user could read every deployment record system-wide. Fix applied both gates: added
`QueryDeploymentsCommand` to the `Deployment`-role group in `GetRequiredRole`, and threaded
`AuthenticatedUser` into `HandleQueryDeployments` — the `InstanceId` branch now calls
`EnforceSiteScopeForInstance`; the unfiltered branch resolves each record's instance to its
`SiteId` (cached) and drops records outside `PermittedSiteIds`, mirroring `HandleListInstances`.
Regression tests: `QueryDeployments_WithDesignRole_ReturnsUnauthorized`,
`QueryDeployments_FilteredByOutOfScopeInstance_ReturnsUnauthorized`,
`QueryDeployments_FilteredByInScopeInstance_ReturnsRecords`,
`QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords`,
`QueryDeployments_UnfilteredForAdminUser_ReturnsAllRecords`.
### ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:647``:659` |
**Description**
`HandleSetInstanceOverrides` iterates `cmd.Overrides` and calls
`InstanceService.SetAttributeOverrideAsync` once per attribute, throwing
`InvalidOperationException` on the first `result.IsSuccess == false`. Each
`SetAttributeOverrideAsync` call persists independently, so if the command supplies five
overrides and the third fails (e.g. an unknown attribute name or a validation error), the
first two overrides are already committed to the configuration database while the caller
receives a `ManagementError`. The instance is left partially mutated in a state the operator
neither sees nor requested, and the per-instance operation lock referenced in the design's
deployment decisions does not protect against this because the partial writes are committed
before the throw. A retry of the same command then re-applies the already-applied overrides.
**Recommendation**
Make the multi-override mutation all-or-nothing: either validate every requested override up
front before applying any, or apply all overrides within a single transaction / unit-of-work
so a mid-batch failure rolls back the earlier writes. If `InstanceService` cannot offer a
batch method, at minimum document the partial-application behaviour on `SetInstanceOverridesCommand`
and have the handler report which overrides were applied before the failure so the caller
can reconcile.
**Resolution**
Resolved 2026-05-17 (commit pending). Confirmed: each `SetAttributeOverrideAsync` call
commits independently, so a mid-batch failure left earlier overrides persisted.
`HandleSetInstanceOverrides` now validates every requested attribute up front against the
instance's template (exists, not locked) and only begins applying once the whole batch is
known valid — eliminating the realistic partial-mutation failure modes (unknown / locked
attribute). `InstanceService` is outside this module's edit scope and offers no batch/
transactional method, so a genuine database fault mid-apply remains theoretically possible;
this residual is documented in a code comment on the handler. Regression tests:
`SetInstanceOverrides_WithOneInvalidAttribute_PersistsNoOverrides`,
`SetInstanceOverrides_AllValid_PersistsAllOverrides`.
### ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:121``:131` |
**Description**
`MapFault` maps any non-`SiteScopeViolationException` fault to
`new ManagementError(correlationId, cause.Message, "COMMAND_FAILED")`, and
`ManagementEndpoints.HandleRequest` returns that `Error` string directly in the HTTP 400
body. For handler-thrown `InvalidOperationException`s carrying a curated `result.Error`
message this is intended and safe. But the same path also surfaces the raw `.Message` of
unanticipated exceptions — a `SqlException` (which can include server/database names and
constraint details), a `DbUpdateException`, an `ArgumentException` from `Enum.Parse` on a
malformed `DataType`/`TriggerType` value, or a `NullReferenceException` — straight to the
external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception
text is already logged server-side with full context, so the client does not need the raw
message.
**Recommendation**
Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated
exception type (e.g. `ManagementCommandException`) for messages that are safe to surface, and
in `MapFault` return that message for the known type while returning a generic
"An internal error occurred (CorrelationId=...)" string for everything else — the operator
can still correlate to the server log via the correlation ID.
**Resolution**
Resolved 2026-05-17 (commit pending). Added a `ManagementCommandException` type for curated,
caller-safe failures and converted every curated `throw` in `ManagementActor` (the 34
`result.Error` rethrows and 15 "not found" / delete-blocked messages) to it. `MapFault` now
returns the message verbatim only for `ManagementCommandException`; any other fault (DB
errors, `Enum.Parse` `ArgumentException`, `NullReferenceException`, the unknown-command
`NotSupportedException`, etc.) yields a generic `"An internal error occurred
(CorrelationId=...)"` string while the full exception is still logged server-side. Regression
tests: `UnexpectedFault_ReturnsGenericMessage_NotRawExceptionText`,
`CuratedHandlerFailure_SurfacesTheCuratedMessage`; the two pre-existing
`*_WhenRepoThrows_*` tests were updated to assert the raw repository message is no longer
leaked.
### ManagementService-017 — QueryDeploymentsCommand has no test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
**Description**
`QueryDeploymentsCommand` / `HandleQueryDeployments` is exercised by no test in
`ManagementActorTests`. There is no test that it requires the `Deployment` role, no test of
the `InstanceId`-filtered versus unfiltered branches, and — because the handler performs no
site-scope enforcement at all — no test that would have caught finding 014. The deployment
query is one of the operations the design's Authorization section calls out for site
scoping, yet it is the only `Deployment`-role command with neither an authorization test nor
a site-scope test.
**Recommendation**
Add tests for `QueryDeploymentsCommand`: a role test (Design/no-role caller →
`ManagementUnauthorized`), branch coverage for the `InstanceId`-filtered and unfiltered
repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped
Deployment user against in-scope and out-of-scope deployment records.
**Resolution**
Resolved 2026-05-17 (commit pending). Added seven `QueryDeployments_*` tests to
`ManagementActorTests`: role gate (`_WithDesignRole_ReturnsUnauthorized`), the
`InstanceId`-filtered and unfiltered branches (`_FilteredByInstanceId_ReturnsInstanceRecords`,
`_UnfilteredWithDeploymentRole_ReturnsAllRecords`), and site-scope coverage for a site-scoped
Deployment user and an Admin user, in- and out-of-scope
(`_FilteredByOutOfScopeInstance_ReturnsUnauthorized`, `_FilteredByInScopeInstance_ReturnsRecords`,
`_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords`, `_UnfilteredForAdminUser_ReturnsAllRecords`).
### ManagementService-018 — QueryAuditLogCommand has no role gate
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:153``:207`, `:336`, `:1302` |
**Description**
`QueryAuditLogCommand` is dispatched at line 336 to `HandleQueryAuditLog`, which calls
`ICentralUiRepository.GetAuditLogEntriesAsync(...)` with no role check, no site-scope
check, and no actor filter. `GetRequiredRole` (lines 153207) does not list
`QueryAuditLogCommand`, so it falls through to the `_ => null` case — i.e. "read-only
queries — any authenticated user". The parallel `/api/audit/query` endpoint in
`AuditEndpoints.HandleQuery` correctly enforces `AuthorizationPolicies.OperationalAuditRoles`
(`{ "Admin", "Audit", "AuditReadOnly" }`), so a CLI authenticated as a user with only the
`Deployment` role — or no roles at all — is rejected at `/api/audit/query` but can read
the *same* audit log table through `/management` by sending `QueryAuditLogCommand`. The
two surfaces enforce different permissions on the same data; the older
ManagementActor-routed path is the looser one. The audit log records every script-trust-
boundary action and is sensitive operationally — it should not be readable by a default
authenticated user.
This is the same authorization-bypass class as findings 001/002/014 and was missed in
that sweep because `QueryAuditLogCommand` (legacy `Action`/`EntityType` filter) is a
separate command from the new keyset-paged `IAuditLogRepository.QueryAsync` path the
`/api/audit/query` endpoint uses.
**Recommendation**
Add `QueryAuditLogCommand` to `GetRequiredRole`. The natural fit is a new
`"OperationalAudit"`-style role group — but `GetRequiredRole` returns a single string and
the project's existing role gates do too (`Admin`/`Design`/`Deployment`). Two equally
defensible options:
1. Add `QueryAuditLogCommand` to the `Admin`-required group — strict, mirrors that
`AuditExportRoles` includes `Admin`. The CLI's CLI-017/018 audit work uses
`/api/audit/query`, so `QueryAuditLogCommand` may be effectively orphaned anyway.
2. Extend `GetRequiredRole` to return a role *set* and add an `AuditRoles` group equal to
`AuthorizationPolicies.OperationalAuditRoles`, so the two surfaces converge.
Recommended: option 1 plus a deprecation comment on `QueryAuditLogCommand` pointing at
`/api/audit/query` — the legacy command's filter shape is a subset of the new endpoint's,
so the ManagementActor route is redundant. Add a regression test asserting that a
no-role / `Deployment`-only caller gets `ManagementUnauthorized` for `QueryAuditLogCommand`.
### ManagementService-019 — AuditEndpoints builds PermittedSiteIds but never enforces them
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/AuditEndpoints.cs:358``:368`, `:397``:437` |
**Description**
`AuditEndpoints.AuthenticateAsync` resolves the caller's roles AND `PermittedSiteIds` and
wraps them in an `AuthenticatedUser` (lines 358366), but the returned `AuthenticatedUser`
is then only used for the `HasAnyRole(...)` role check on lines 114 and 163 — its
`PermittedSiteIds` are never read. `ParseFilter` (line 397) accepts the caller-supplied
`sourceSiteId=...` query string verbatim and passes it straight into the
`IAuditLogRepository.QueryAsync` filter. A user whose `Audit` (or `AuditReadOnly`) role
mapping carries scope rules — e.g. `AuditReadOnly` scoped to "plant-a" — can still ask
for `sourceSiteId=plant-b` and get back rows for plant-b.
Today this gap is partially benign because the design treats `Audit`/`AuditReadOnly` as
non-site-scoped roles (`Component-AuditLog.md` does not list site scoping for the audit
permissions, and the LDAP role mapping UI does not currently surface site scope rules
for those roles). But (a) the `RoleMapper` will silently honour scope rules attached to
any role, including `Audit`, so an operator who *does* configure them gets a UI that
says "scoped" and an endpoint that ignores the scope — a contract violation; (b) the
`Admin` role's `PermittedSiteIds` are always empty (system-wide), so enforcing for the
other roles is cheap. The asymmetry with the `/management` endpoint — which routes every
site-targeted command through `EnforceSiteScope` — is also a maintenance hazard.
**Recommendation**
Decide explicitly whether the audit endpoints honour site scope. Two options:
1. **Honour scope** — in `HandleQuery` / `HandleExport`, after the role check, intersect
the caller-supplied `filter.SourceSiteIds` with `user.PermittedSiteIds`. If the
caller supplied no `sourceSiteId` and `PermittedSiteIds` is non-empty, restrict to
`PermittedSiteIds`. If the intersection is empty, return an empty page (or a 403 if
the caller explicitly asked for an out-of-scope site).
2. **Document the intentional bypass** — drop the `PermittedSiteIds` field from the
`AuthenticatedUser` constructed in `AuthenticateAsync` (or comment it as "ignored —
audit roles are not site-scoped") so the code stops carrying a value it does not
read, and add an XML doc note on the endpoint class that audit roles are always
system-wide by design.
Recommended: option 1, mirroring the `ManagementActor` pattern — same security posture
across both surfaces. Add a regression test that a site-scoped `AuditReadOnly` user
filtering on an out-of-scope site gets a 403 (or an empty page).
### ManagementService-020 — UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1136``:1153` |
**Description**
`HandleUpdateSmtpConfig` reads the existing `SmtpConfiguration` entity, applies the
incoming command, and then **(a)** passes the full `config` object as the `afterState`
to `AuditAsync` (line 1151) — meaning the SMTP credential string is persisted in the
audit log — and **(b)** returns the full `config` to the caller (line 1152), which is
serialized via `SerializeResult` and sent back over HTTP. `SmtpConfiguration.Credentials`
carries the SMTP-Auth password (for `Basic`) or the OAuth2 client secret (for
`OAuth2ClientCredentials`); `SmtpConfiguration` has no `[JsonIgnore]` on this field
and `SerializeResult`'s `JsonSerializerOptions` does not exclude it. The pattern
parallels what ConfigurationDatabase-012 fixed for inbound API keys: a credential
artifact must not be echoed back through every read/audit path.
The credential is supplied by the operator in `UpdateSmtpConfigCommand.Credentials`,
so the caller already has it. But (1) anyone with read access to the audit log
(`OperationalAuditRoles`) can now retrieve every SMTP credential change verbatim — a
strictly larger blast radius than `Admin`-only `UpdateSmtpConfig`. (2) The serialized
`config` echo means the credential moves over the wire in the response even though the
caller has no need for it. (3) Any future read path that returns
`SmtpConfiguration``ListSmtpConfigsCommand` already does at line 1130 — will leak
the stored credential too.
**Recommendation**
Three changes, in order of priority:
1. In `HandleUpdateSmtpConfig` and `HandleListSmtpConfigs`, project to a credential-free
shape before returning — e.g. `new { config.Id, config.Host, config.Port,
config.AuthType, config.FromAddress, config.TlsMode }`. Match the
`HandleListApiKeys` pattern.
2. In `AuditAsync` for the SMTP path, pass a credential-free `afterState` (the same
anonymous shape). The fact that *something* changed is auditable; the secret value
is not.
3. Tag `SmtpConfiguration.Credentials` with `[JsonIgnore]` in Commons (out-of-scope edit
for this module, but worth a follow-up). Alternatively, configure
`ResultSerializerOptions` with a property name policy that skips a known set of
credential field names — but a per-entity projection is cleaner.
Add regression tests: `UpdateSmtpConfig_DoesNotEchoCredentialsInResponse` and
`UpdateSmtpConfig_DoesNotPersistCredentialsInAuditLog`.
### ManagementService-021 — Transport bundle handlers have zero test coverage
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1`; `src/ScadaLink.ManagementService/ManagementActor.cs:1717``:1897` |
**Description**
The three Transport (#24) bundle handlers — `HandleExportBundle`, `HandlePreviewBundle`,
`HandleImportBundle` (~180 lines of handler logic at the bottom of `ManagementActor.cs`)
— have **no tests** in `ManagementActorTests`. Specifically untested:
1. **Role gating.** `ExportBundleCommand` requires `Design`; `PreviewBundleCommand` and
`ImportBundleCommand` require `Admin`. No test asserts that the wrong role gets
`ManagementUnauthorized`. CLI-017 / CLI-018 just landed around bundle plumbing — a
future refactor that moves these commands between role groups in `GetRequiredRole`
would silently regress the gate.
2. **Name resolution in `HandleExportBundle`.** The inner `ResolveIds<T>` helper raises
`ManagementCommandException` for unknown names. The "all entity types" branch
(`cmd.All == true`) and the "missing name" branch are both untested.
3. **`HandleImportBundle` blocker rejection.** The handler aborts before `ApplyAsync`
when any `ConflictKind.Blocker` row is present; the produced error message is
curated and surfaced to the caller, but no test asserts the abort path or that the
importer's `ApplyAsync` was not called.
4. **Resolution dedupe.** `HandleImportBundle` dedupes `(EntityType, Name)` keys
last-write-wins — the dedupe is critical (CLI-014 was about it on the CLI side) but
has no actor-side regression test.
5. **`DecodeBundle` failure modes** (empty/non-base64 input) — both branches return
curated `ManagementCommandException` but neither is exercised.
6. **`ParseConflictPolicy`** for `"skip"`, `"overwrite"`, `"rename"`, and the invalid-
value branch — all untested.
Given the size and reach of the bundle path (cross-cutting central configuration
import), this gap is materially larger than usual for new handler code.
**Recommendation**
Add an `ImportBundleHandlerTests` suite covering:
- role gating for all three commands (`Design`/`Admin` mismatch -> `ManagementUnauthorized`),
- `ExportBundleCommand(All: true)` happy-path,
- `ExportBundleCommand` with an unknown name -> `ManagementError`,
- `ImportBundleCommand` with a `Blocker` row -> `ManagementError` and `ApplyAsync` not called,
- `ImportBundleCommand` with duplicate preview items -> dedupe to one resolution per (type, name),
- `DecodeBundle` empty/invalid base64,
- `ParseConflictPolicy` all four branches.
Use NSubstitute for `IBundleImporter` / `IBundleExporter` (no need for a real bundle in
the actor tests; the bundle round-trip belongs in `Transport` tests).
### ManagementService-022 — Design doc is stale on Transport bundle commands, /api/audit/* endpoints, and CommandTimeout
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | `docs/requirements/Component-ManagementService.md:77``:175`, `:205``:209` |
**Description**
`Component-ManagementService.md` does not mention three pieces of shipped functionality:
1. **Transport (#24) bundle commands.** `ExportBundleCommand`, `PreviewBundleCommand`,
and `ImportBundleCommand` are dispatched at `ManagementActor.cs:350``:352` and
role-gated in `GetRequiredRole` (Design for Export; Admin for Preview/Import). The
design doc's "Message Groups" section enumerates Templates, Instances, Sites, Data
Connections, Deployments, External Systems, Notifications, Security, Audit Log,
Shared Scripts, Database Connections, Inbound API Methods, Health, and Remote
Queries — but has no "Transport" / "Bundles" group. The CLI now offers `bundle
export`/`preview`/`import` (per the recent CLI-017/018 work) and points
at these commands.
2. **`/api/audit/*` endpoints.** The doc's "HTTP Management API" section (line 52)
describes only `POST /management`. `AuditEndpoints.MapAuditAPI()` adds
`GET /api/audit/query` and `GET /api/audit/export` with their own auth-and-role
path mirroring `ManagementEndpoints` (intentionally — see the `AuditEndpoints` XML
docs), but the design doc gives no signal that the module exposes more than one
route group, no per-endpoint role mapping table, and no mention that the response
shape differs (keyset cursor vs. opaque page).
3. **`CommandTimeout`.** Line 209 still says "Reserved for future configuration —
e.g., command timeout overrides", but ManagementService-010 wired the option through
`ResolveAskTimeout`. The doc is stale.
**Recommendation**
Update `Component-ManagementService.md`:
- Add a "Transport" entry to "Message Groups" listing `ExportBundle`,
`PreviewBundle`, `ImportBundle` with their per-command roles. Cross-reference
`Component-Transport.md`.
- Add an "Audit Log HTTP API" subsection under "HTTP Management API" describing
`GET /api/audit/query` (keyset cursor, `OperationalAuditRoles`) and
`GET /api/audit/export` (csv/jsonl streaming, `AuditExportRoles`, parquet 501).
Note the deliberate divergence in the source-site query-string key
(`sourceSiteId` vs CentralUI's `site`).
- In the "Configuration" table, replace "Reserved for future configuration" with the
actual `CommandTimeout` semantics: "Max time the HTTP endpoint will Ask the
ManagementActor before returning HTTP 504; falls back to 30 s when unset or
non-positive."
### ManagementService-023 — HandleQueryDeployments unfiltered branch is N+1 on instance lookup
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1276``:1295` |
**Description**
The site-scoped unfiltered branch of `HandleQueryDeployments` (added under
ManagementService-014) reads every `DeploymentRecord` via `GetAllDeploymentRecordsAsync`,
then for each *unique* `record.InstanceId` calls
`ITemplateEngineRepository.GetInstanceByIdAsync` to resolve the instance's
`SiteId`. The handler caches results in `instanceSiteCache` so each instance is loaded
at most once per call, but for a fleet with N distinct instances having deployment
history, the handler still issues N round-trips to the configuration database to
authorize a single query. With a large deployment history the cumulative DB hit can be
material; it also runs every time a site-scoped user opens the deployments page.
This is acceptable in steady state today (sites tend to have small fleets and few
deployments) but is a textbook N+1 read pattern, and on a busy day for a site-scoped
operator the cost will dominate the request. Admin and system-wide Deployment users
correctly skip the loop (they hit only `GetAllDeploymentRecordsAsync`).
**Recommendation**
Add a batch-resolve method to `ITemplateEngineRepository` — e.g.
`Task<IDictionary<int, int>> GetInstanceSiteIdsAsync(IEnumerable<int> instanceIds)`
backed by a single EF query
(`Instances.Where(i => instanceIds.Contains(i.Id)).Select(i => new { i.Id, i.SiteId })`).
`HandleQueryDeployments` would then issue exactly two queries on the unfiltered branch
(records + sites) regardless of fleet size. The change is additive to
`ITemplateEngineRepository` and out-of-module for the actual implementation, but the
handler change is local; a quick interim alternative is to project deployment records
to include the instance's `SiteId` at the repo level, which removes the second query
entirely.
Defer until a noticeable hot path emerges, but track it: this is the only N+1 in
`ManagementActor` once 002 / 014 are folded in.