docs(code-reviews): re-review batch 3 at 39d737e — Host, InboundAPI, ManagementService, NotificationService, Security

21 new findings: Host-012..015, InboundAPI-014..017, ManagementService-014..017, NotificationService-014..018, Security-012..015.
This commit is contained in:
Joseph Doherty
2026-05-17 00:48:25 -04:00
parent 89636e2bbf
commit 3b3760f026
6 changed files with 873 additions and 41 deletions

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.ManagementService` |
| Design doc | `docs/requirements/Component-ManagementService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 0 (1 Deferred — see ManagementService-012) |
| Commit reviewed | `39d737e` |
| Open findings | 4 (1 Deferred — see ManagementService-012) |
## Summary
@@ -28,6 +28,24 @@ instances never disposed) and dead/unused configuration. None of the findings ar
crash-class, but the site-scope gaps are High severity because they are a real
authorization bypass with no workaround.
#### Re-review 2026-05-17 (commit `39d737e`)
All thirteen prior findings remain correctly closed; the source under
`src/ScadaLink.ManagementService` is byte-identical between the previously reviewed state
and `39d737e` (the resolution commits of findings 001013 are folded into the history at or
before `39d737e`). ManagementService-012 was re-checked and its **Deferred** status still
holds: `ManagementEnvelope.Command` is still typed `object`, and the marker-interface fix
still belongs in the Commons module, outside this module's edit scope — nothing has changed
to make it actionable here. This re-review re-ran the full 10-category checklist against the
current sources and surfaced **four new findings**. The dominant theme is the same
site-scope authorization gap that findings 001/002 closed: `HandleQueryDeployments`
(`QueryDeploymentsCommand`) was overlooked by that sweep and still performs no site-scope
enforcement, letting a site-scoped Deployment user read deployment history for any site
(014, High). The remaining three are lower severity: a non-atomic multi-override mutation
that can leave an instance partially modified after an error (015, Medium), raw exception
messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and
`QueryDeploymentsCommand` having no test coverage at all (017, Low).
## Checklist coverage
| # | Category | Examined | Notes |
@@ -551,3 +569,144 @@ infrastructure, so the testable command-parsing/dispatch logic was extracted int
`ParseCommand` helper and covered instead. Tests: `ParseCommand_*` (5),
`SerializeResult_*` (2), `UnknownCommandType_FaultMappedToManagementError`, plus the
pre-existing site-scope and DebugStreamHub suites. `dotnet test` -> 48 passed.
### ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:306`, `:1174` |
**Description**
`QueryDeploymentsCommand` is gated to the `Deployment` role by `GetRequiredRole`
(`ManagementActor.cs:170``:177`), and the design document's Authorization section states
"Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments
among the Deployment-role operations. `HandleQueryDeployments` makes no call to
`EnforceSiteScope` / `EnforceSiteScopeForInstance` / `EnforceSiteScopeForIdentifier`: with no
`InstanceId` it returns `IDeploymentManagerRepository.GetAllDeploymentRecordsAsync()` (every
deployment record across all sites), and with an `InstanceId` it returns that instance's
deployment history with no check that the instance's site is within the caller's permitted
set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment
records for instances at site B — instance IDs, `DeployedBy` (operator usernames),
timestamps, deployment status, and `ErrorMessage` content — by issuing `QueryDeployments`
with or without an out-of-scope `InstanceId`. This is the same authorization-bypass class as
the resolved findings 001/002, on a handler that sweep did not cover; it is `DispatchCommand`'s
only `Deployment`-role handler with no scope enforcement.
**Recommendation**
Thread `AuthenticatedUser` into `HandleQueryDeployments` (the dispatch case at line 306
already has `user` in scope). When `cmd.InstanceId` is supplied, call
`EnforceSiteScopeForInstance` before querying. When it is not supplied, filter the returned
`DeploymentRecord` list to the caller's permitted sites — resolve each record's instance to
its `SiteId` (or join through a site-aware repository query) and drop records for sites
outside `PermittedSiteIds`, mirroring the `HandleListInstances` / `HandleListSites` filter
pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope
instances.
**Resolution**
_Unresolved._
### ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:647``:659` |
**Description**
`HandleSetInstanceOverrides` iterates `cmd.Overrides` and calls
`InstanceService.SetAttributeOverrideAsync` once per attribute, throwing
`InvalidOperationException` on the first `result.IsSuccess == false`. Each
`SetAttributeOverrideAsync` call persists independently, so if the command supplies five
overrides and the third fails (e.g. an unknown attribute name or a validation error), the
first two overrides are already committed to the configuration database while the caller
receives a `ManagementError`. The instance is left partially mutated in a state the operator
neither sees nor requested, and the per-instance operation lock referenced in the design's
deployment decisions does not protect against this because the partial writes are committed
before the throw. A retry of the same command then re-applies the already-applied overrides.
**Recommendation**
Make the multi-override mutation all-or-nothing: either validate every requested override up
front before applying any, or apply all overrides within a single transaction / unit-of-work
so a mid-batch failure rolls back the earlier writes. If `InstanceService` cannot offer a
batch method, at minimum document the partial-application behaviour on `SetInstanceOverridesCommand`
and have the handler report which overrides were applied before the failure so the caller
can reconcile.
**Resolution**
_Unresolved._
### ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:121``:131` |
**Description**
`MapFault` maps any non-`SiteScopeViolationException` fault to
`new ManagementError(correlationId, cause.Message, "COMMAND_FAILED")`, and
`ManagementEndpoints.HandleRequest` returns that `Error` string directly in the HTTP 400
body. For handler-thrown `InvalidOperationException`s carrying a curated `result.Error`
message this is intended and safe. But the same path also surfaces the raw `.Message` of
unanticipated exceptions — a `SqlException` (which can include server/database names and
constraint details), a `DbUpdateException`, an `ArgumentException` from `Enum.Parse` on a
malformed `DataType`/`TriggerType` value, or a `NullReferenceException` — straight to the
external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception
text is already logged server-side with full context, so the client does not need the raw
message.
**Recommendation**
Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated
exception type (e.g. `ManagementCommandException`) for messages that are safe to surface, and
in `MapFault` return that message for the known type while returning a generic
"An internal error occurred (CorrelationId=...)" string for everything else — the operator
can still correlate to the server log via the correlation ID.
**Resolution**
_Unresolved._
### ManagementService-017 — QueryDeploymentsCommand has no test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
**Description**
`QueryDeploymentsCommand` / `HandleQueryDeployments` is exercised by no test in
`ManagementActorTests`. There is no test that it requires the `Deployment` role, no test of
the `InstanceId`-filtered versus unfiltered branches, and — because the handler performs no
site-scope enforcement at all — no test that would have caught finding 014. The deployment
query is one of the operations the design's Authorization section calls out for site
scoping, yet it is the only `Deployment`-role command with neither an authorization test nor
a site-scope test.
**Recommendation**
Add tests for `QueryDeploymentsCommand`: a role test (Design/no-role caller →
`ManagementUnauthorized`), branch coverage for the `InstanceId`-filtered and unfiltered
repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped
Deployment user against in-scope and out-of-scope deployment records.
**Resolution**
_Unresolved._