Files

T

Joseph Doherty d190345ef0 test(coverage): close Theme 8 — 13 test-coverage findings, +35 tests

13 well-bounded test-coverage gaps closed across 11 test projects.
Net +35 regression tests; no production code changes except the
SiteEventLogger src reference unchanged (W3 redacted only test code).

Test additions:
- CLI-022: CommandTreeTests pinned-count assertion bumped 14→16 and
  3 InlineData rows added for the audit + bundle command groups.
- Commons-020: new TransportRecordsTests covers BundleManifest /
  ExportSelection / ImportPreview / ImportResolution / ImportResult —
  ctor + System.Text.Json round-trip + record-equality (14 tests).
- CD-024: SPLIT-RANGE failure-continuation now under
  EnsureLookahead_SecondSplitThrows_LoopAborts_FirstBoundaryStillCommitted
  (Skippable MS-SQL fixture); production-shape rowversion delete
  asserted by DeleteDeploymentRecord_CurrentRowVersion_StubAttachPath_DeleteSucceeds.
- CentralUI-033: new QueryStringDrillInTests with 4 bUnit cases for
  Transport + SiteCalls drill-in / query-string handling.
- DM-024: probe actors (ReconcileProbeActor, SerializationProbeActor,
  ArtifactProbeActor) refactored from static fields to per-test instances
  (Interlocked on counter) — all 31 callers updated; no production
  changes required.
- HM-022: real-time PeriodicTimer test flake fixed by replacing
  fixed-budget Task.Delay with a RunLoopUntil poll-until-condition
  helper (5s/25ms). Production loop untouched.
- InboundAPI-023: new EndpointExtensionsTests covers the
  POST /api/{methodName} composition wiring via TestServer (7 cases:
  happy path, missing key 401, unknown method 403, invalid JSON 400,
  missing param 400, script-throws 500 sanitised, AuditActorItemKey
  stash invariant).
- MgmtSvc-021: 6 new ManagementActorTests cover the Transport bundle
  handlers (role gate for Export/Preview/Import, unknown-name
  ManagementCommandException, blocker-rejection, dedupe last-write-wins).
- SCA-006: SiteCallQueryRequest_StuckOnly_CursorAtNonStuckBoundary_SkipsToNextStuckRow
  pins the missing boundary case.
- SEL-023: stress-test `bool stop` promoted to `volatile bool` for
  cross-thread visibility under release/JIT.

Verify-only resolutions:
- NS-024: closed by NS-019 (commit ac96b83 deletion of
  NotificationDeliveryService + its test file). No edits needed.
- NotifOutbox-008: FallbackMaxRetries/FallbackRetryDelay are private
  forward-compat constants returned only when no SMTP-config row exists
  (in which case EmailNotificationDeliveryAdapter returns Permanent,
  bypassing the values entirely). Marked Resolved with note.
- Transport-010: Overwrite child-collection sync covered by the T-001/
  T-002 tests added in commit e3ca9af; per-IP throttle by
  BundleUnlockRateLimiterTests; failed-session retention by
  BundleSessionStoreTests; T-009 closed structurally via AsyncLocal.
  Marked Resolved by reference.

Build clean; all 11 affected test suites green. README regenerated:
33 open (was 46).

2026-05-28 08:21:03 -04:00

66 KiB

Raw Blame History

Code Review — ManagementService

Field	Value
Module	`src/ScadaLink.ManagementService`
Design doc	`docs/requirements/Component-ManagementService.md`
Status	Reviewed
Last reviewed	2026-05-28
Reviewer	claude-agent
Commit reviewed	`1eb6e97`
Open findings	0 (1 Deferred — see ManagementService-012)

Summary

The ManagementService module is a thin command-dispatch layer: a single ManagementActor fronts every administrative operation, an HTTP POST /management endpoint authenticates and forwards to it, and a SignalR DebugStreamHub provides real-time debug streaming. The code is consistently structured and the role-based authorization gate (GetRequiredRole) is broadly correct and well tested. However, the review surfaced a significant security theme: site-scope enforcement, which the design document requires for instance- and site-targeted Deployment operations, is applied inconsistently — several query handlers and all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped Deployment user to read or act on sites outside their scope. A second theme is Akka.NET convention drift: the actor offloads all work to Task.Run instead of using PipeTo, declares no supervision strategy, and the contract messages carry a loosely-typed object payload. There are also resource-management defects in the HTTP endpoint (JsonDocument instances never disposed) and dead/unused configuration. None of the findings are crash-class, but the site-scope gaps are High severity because they are a real authorization bypass with no workaround.

Re-review 2026-05-17 (commit `39d737e`)

All thirteen prior findings remain correctly closed; the source under src/ScadaLink.ManagementService is byte-identical between the previously reviewed state and 39d737e (the resolution commits of findings 001–013 are folded into the history at or before 39d737e). ManagementService-012 was re-checked and its Deferred status still holds: ManagementEnvelope.Command is still typed object, and the marker-interface fix still belongs in the Commons module, outside this module's edit scope — nothing has changed to make it actionable here. This re-review re-ran the full 10-category checklist against the current sources and surfaced four new findings. The dominant theme is the same site-scope authorization gap that findings 001/002 closed: HandleQueryDeployments (QueryDeploymentsCommand) was overlooked by that sweep and still performs no site-scope enforcement, letting a site-scoped Deployment user read deployment history for any site (014, High). The remaining three are lower severity: a non-atomic multi-override mutation that can leave an instance partially modified after an error (015, Medium), raw exception messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and QueryDeploymentsCommand having no test coverage at all (017, Low).

Re-review 2026-05-28 (commit `1eb6e97`)

All seventeen prior findings remain correctly closed; ManagementService-012 is still the only Deferred entry (marker-interface on ManagementEnvelope.Command still belongs in the Commons module). The module has grown substantially since the last review (+1997 lines): the Transport (#24) bundle commands (ExportBundle/PreviewBundle/ImportBundle) have been added to ManagementActor, and a new AuditEndpoints.cs (/api/audit/query and /api/audit/export) ships alongside the existing /management endpoint. This re-review re-ran the full 10-category checklist and surfaced six new findings. The dominant theme is the same authorization gap that findings 001/002/003/014 closed for the ManagementActor, now resurfacing in the new surfaces: QueryAuditLogCommand has no role gate at all (018, High) — any authenticated user can read the configuration audit log via /management, even though the parallel /api/audit/query requires OperationalAuditRoles. The new /api/audit/{query,export} endpoints build an AuthenticatedUser with PermittedSiteIds but never enforce site scope (019, Medium) — although audit roles are not site-scoped by design, the user-supplied sourceSiteId filter is honoured verbatim. HandleUpdateSmtpConfig returns the full SmtpConfiguration entity (including the Credentials field, which can carry SMTP passwords / OAuth2 client secrets) in the response and audit row (020, Medium). The Transport (#24) bundle commands have zero test coverage in ManagementActorTests (021, Medium) — neither role gating nor success/error paths. The Component-ManagementService.md design doc is stale on three fronts: it does not mention Transport bundle commands, the /api/audit/* endpoints, or the now-wired CommandTimeout option (022, Low). Finally, HandleQueryDeployments issues one GetInstanceByIdAsync per unique instance ID when filtering for a site-scoped user — an N+1 read pattern on the unfiltered branch (023, Low).

Checklist coverage

#	Category	Examined	Notes
1	Correctness & logic bugs	+	`HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011.
2	Akka.NET conventions	+	`Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012.
3	Concurrency & thread safety	+	Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure.
4	Error handling & resilience	+	Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009.
5	Security	+	Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003.
6	Performance & resource management	+	`JsonDocument` instances never disposed in the HTTP endpoint. See 006.
7	Design-document adherence	+	Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010.
8	Code organization & conventions	+	Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009.
9	Testing coverage	+	Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013.
10	Documentation & comments	+	XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011).

Re-review (2026-05-28, 1eb6e97):

#	Category	Examined	Notes
1	Correctness & logic bugs	+	`HandleImportBundle` correctly dedupes resolutions per (entity,name); `ParseDocument` still allocates a `JsonDocument.Parse("{}")` on the failure path but the caller's `using` disposes it. No new defects.
2	Akka.NET conventions	+	PipeTo dispatch from 004 is intact; supervision strategy from 005 is intact; `Sender` correctly captured to local before PipeTo. No new findings.
3	Concurrency & thread safety	+	Bundle handlers `await` cleanly; `BundleSession` is not cleaned up if `PreviewAsync`/`ApplyAsync` throws, but that is an `IBundleImporter` contract concern outside this module. No new findings.
4	Error handling & resilience	+	`ManagementCommandException` from 016 is applied consistently across the new bundle handlers (curated `CryptographicException`/`ArgumentException` paths). No new findings.
5	Security	+	`QueryAuditLogCommand` has no role gate (018, High). New `/api/audit/*` endpoints build `PermittedSiteIds` but never enforce them (019, Medium). `HandleUpdateSmtpConfig` returns + audits `Credentials` verbatim (020, Medium).
6	Performance & resource management	+	`HandleQueryDeployments` unfiltered-with-scope branch is N+1 on instance lookups (023, Low). Request body up to 200 MB read into a single `string` in `HandleRequest` (acceptable per Transport bundle requirement).
7	Design-document adherence	+	`Component-ManagementService.md` is stale on Transport bundle commands, `/api/audit/*` endpoints, and the now-wired `CommandTimeout` (022, Low).
8	Code organization & conventions	+	`AuditEndpoints` duplicates the Basic Auth → LDAP → roles flow from `ManagementEndpoints` (~50 lines). Acknowledged in `AuditEndpoints` XML but worth tracking. No new finding raised.
9	Testing coverage	+	Transport bundle commands have zero `ManagementActorTests` coverage — neither role gating nor handler logic (021, Medium).
10	Documentation & comments	+	New `AuditEndpoints` XML doc is high quality. `Component-ManagementService.md` not updated for Transport/Audit endpoints (022 covers).

Findings

ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649`

Description

The design document (Component-ManagementService.md, Authorization section) states that for Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists "debug snapshot, parked message queries, site event log queries" among the Deployment-role operations. HandleQueryEventLogs, HandleQueryParkedMessages, HandleDebugSnapshot, HandleRetryParkedMessage, and HandleDiscardParkedMessage make no call to EnforceSiteScope or EnforceSiteScopeForInstance. A Deployment user scoped to site A can therefore query event logs / parked messages of site B, retry or discard another site's parked messages, and pull a debug snapshot of any instance simply by supplying a different SiteIdentifier or InstanceId. This is an authorization bypass with no workaround.

Recommendation

In each of these handlers resolve the target site and call site-scope enforcement before delegating to CommunicationService. For the SiteIdentifier-keyed handlers, look up the Site by identifier and enforce against Site.Id; for DebugSnapshotCommand the instance is already loaded — call EnforceSiteScope(user, instance.SiteId) (which requires threading AuthenticatedUser into these handlers, currently dropped).

Resolution

Resolved 2026-05-16 (commit <pending>). Threaded AuthenticatedUser into HandleQueryEventLogs, HandleQueryParkedMessages, HandleRetryParkedMessage, HandleDiscardParkedMessage, and HandleDebugSnapshot; added an EnforceSiteScopeForIdentifier helper that resolves the site by identifier and applies EnforceSiteScope. HandleDebugSnapshot enforces against the already-loaded instance's SiteId. Regression tests: QueryEventLogs_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, QueryParkedMessages_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, RetryParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, DiscardParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, DebugSnapshot_OutOfScopeForSiteScopedUser_ReturnsUnauthorized.

ManagementService-002 — Single-entity query handlers leak data across site scope


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624`

Description

HandleListInstances and HandleListSites correctly filter their results by the user's PermittedSiteIds, but the single-entity query handlers do not. HandleGetInstance, HandleGetSite, HandleListAreas, and HandleGetDataConnection fetch by ID with no site-scope check, so a site-scoped Deployment user can read any instance, site, area tree, or data connection by ID even though that site is excluded from their scope. The list endpoints having a filter while the get-by-id endpoints do not is an inconsistency that undermines the scoping model. (HandleGetDeploymentDiff and HandleListInstanceAlarmOverrides do enforce scope, confirming the omission elsewhere is unintentional.)

Recommendation

Apply EnforceSiteScopeForInstance in HandleGetInstance, and EnforceSiteScope against the resolved site ID in HandleGetSite, HandleListAreas, and HandleGetDataConnection (for data connections, scope by the connection's SiteId).

Resolution

Resolved 2026-05-16 (commit <pending>). HandleGetInstance, HandleGetSite, HandleGetDataConnection now take AuthenticatedUser and call EnforceSiteScope against the resolved entity's site ID (instance SiteId, site Id, data-connection SiteId); HandleListAreas enforces against the requested SiteId before querying. Regression tests: GetInstance_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, GetInstance_InScopeForSiteScopedUser_ReturnsSuccess, GetSite_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, GetSite_OutOfScopeForAdminUser_ReturnsSuccess, ListAreas_OutOfScopeForSiteScopedUser_ReturnsUnauthorized, GetDataConnection_OutOfScopeForSiteScopedUser_ReturnsUnauthorized.

ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/DebugStreamHub.cs:104`

Description

OnConnectedAsync authenticates the WebSocket connection and verifies the caller holds the Deployment role, but SubscribeInstance(int instanceId) accepts any instance ID and starts a stream without checking that the authenticated user is scoped to that instance's site. A site-scoped Deployment user can therefore subscribe to the live debug stream (attribute values, alarm states) of an instance belonging to a site outside their scope. This is the streaming equivalent of finding 001/002.

Recommendation

Resolve the instance's site inside SubscribeInstance and reject the subscription if the authenticated user's permitted-site set does not include it. The authenticated identity established in OnConnectedAsync must be persisted on the connection (e.g. in Context.Items) so it is available to SubscribeInstance.

Resolution

Resolved 2026-05-16 (commit <pending>). OnConnectedAsync now persists the resolved roles and PermittedSiteIds in Context.Items. SubscribeInstance resolves the instance's site via ITemplateEngineRepository and rejects the subscription (sending OnStreamTerminated) when the new pure DebugStreamHub.IsInstanceAccessAllowed check fails. The check grants access for the Admin role or system-wide Deployment (empty permitted set) and otherwise requires the instance's site in the permitted set. Regression tests: IsInstanceAccessAllowed_SiteScopedUser_OutOfScopeInstance_Denied, IsInstanceAccessAllowed_SiteScopedUser_InScopeInstance_Allowed, IsInstanceAccessAllowed_SystemWideDeployment_AnySiteAllowed, IsInstanceAccessAllowed_AdminRole_BypassesSiteScope, IsInstanceAccessAllowed_AdminRoleCheck_IsCaseInsensitive.

ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo


Severity	Medium
Category	Akka.NET conventions
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:61`

Description

HandleEnvelope runs every command on a thread-pool thread via Task.Run(async () => ...) and replies from inside the continuation. This is the anti-pattern the project's Akka.NET conventions warn against — the canonical approach is to start the async work and PipeTo its result back to Self/Sender. Although Sender is correctly copied to a local before the closure, the current code: (a) lets multiple commands execute fully concurrently with no actor-thread serialization, so the actor provides no ordering or back-pressure guarantees and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox bound; (c) is shielded from synchronous faults only because every path is inside the try/catch — any future code path that throws synchronously before the Task.Run body would escape it.

Recommendation

Replace Task.Run with a method that returns the Task and PipeTo the mapped result (ManagementSuccess/ManagementError/ManagementUnauthorized) back to the captured sender, mapping faults in the PipeTo failure continuation. If genuine parallelism is desired, make that explicit with a router/dispatcher rather than ad-hoc Task.Run.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: HandleEnvelope ran every command via Task.Run and replied from inside the continuation, contrary to the project's PipeTo convention. Replaced it with a ProcessCommand method returning a Task<object> and PipeTo(sender, success, failure); faults are now mapped uniformly in a MapFault failure continuation (SiteScopeViolationException -> ManagementUnauthorized, otherwise ManagementError), which also unwraps AggregateException. Regression test: UnknownCommandType_FaultMappedToManagementError. Existing success/error/unauthorized mapping tests confirm behaviour is preserved.

ManagementService-005 — ManagementActor declares no supervision strategy


Severity	Low
Category	Akka.NET conventions
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:33`

Description

The project conventions call for explicit supervision strategies (Resume for coordinator actors). ManagementActor is a long-lived coordinator-style actor but overrides no SupervisorStrategy and defines no PreRestart/PostRestart behaviour. In practice it spawns no children so the default strategy is rarely exercised, but an explicit strategy should still be declared for clarity and to match the documented convention; it also matters if children are added later (e.g. if finding 004 introduces worker actors).

Recommendation

Add an explicit protected override SupervisorStrategy SupervisorStrategy() returning a Resume-based strategy, consistent with other central coordinator actors.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: ManagementActor declared no SupervisorStrategy. Added a public static SupervisorStrategy CreateSupervisorStrategy() factory returning an unbounded OneForOneStrategy with a Directive.Resume decider, and a protected override SupervisorStrategy() that delegates to it — matching the Resume-based convention of CentralCommunicationActor/SiteCommunicationActor. The actor spawns no children today, so this is a forward-looking correctness fix. Regression tests: CreateSupervisorStrategy_ReturnsOneForOneStrategy, CreateSupervisorStrategy_ResumesOnArbitraryException, CreateSupervisorStrategy_ResumesIndefinitely (new ManagementActorSupervisionTests.cs).

ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint


Severity	Medium
Category	Performance & resource management
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112`

Description

JsonDocument is IDisposable (it rents buffers from a pooled ArrayPool). HandleRequest parses the request body into doc at line 83 and never disposes it, and line 112 (JsonDocument.Parse("{}")) allocates a second document inline that is also never disposed. Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool churn under load.

Recommendation

Wrap the parsed document in using var doc = .... For the empty-payload fallback, avoid allocating a JsonDocument entirely — deserialize from the literal string "{}"/an empty object, or restructure so the fallback path does not parse a throwaway document.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: the request JsonDocument was never disposed and the empty-payload path allocated a second throwaway JsonDocument. Extracted request parsing into a testable ManagementEndpoints.ParseCommand helper that wraps the document in using; the missing-payload case now deserializes from the "{}" literal string rather than parsing a throwaway document. Regression tests: ParseCommand_WithExplicitPayload_DeserializesIntoCommandType, ParseCommand_WithMissingPayload_DeserializesParameterlessCommand, ParseCommand_WithInvalidJson_ReturnsFailure, ParseCommand_WithMissingCommandField_ReturnsFailure, ParseCommand_WithUnknownCommand_ReturnsFailure.

ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities


Severity	Medium
Category	Code organization & conventions
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113`

Description

The actor serializes every command result with Newtonsoft.Json (JsonConvert.SerializeObject) while the HTTP endpoint deserializes payloads with System.Text.Json. Beyond the inconsistency, JsonConvert.SerializeObject is applied directly to EF-backed entities returned by repositories (e.g. Site, DataConnection, NotificationList with a Recipients collection, Template with children). With default Newtonsoft settings any bidirectional navigation property produces a JsonSerializationException for self-referencing loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.

Recommendation

Standardise on one serializer (the rest of the HTTP path uses System.Text.Json). Serialize explicit DTOs / projections rather than EF entities, or configure ReferenceLoopHandling.Ignore and ignore navigation properties. Verify that handlers returning rich entity graphs (HandleGetTemplate, HandleUpdateNotificationList) round-trip correctly.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: the actor serialized results with Newtonsoft.Json (not even a direct package reference) while the HTTP endpoint uses System.Text.Json. Standardised the actor on System.Text.Json via a new ManagementActor.SerializeResult helper using a shared JsonSerializerOptions with ReferenceHandler.IgnoreCycles (cycle-safe for EF entity graphs) and camelCase naming (matches the CLI's case-insensitive deserializer). Removed the Newtonsoft.Json import. Regression tests: SerializeResult_WithCyclicGraph_DoesNotThrow, SerializeResult_UsesCamelCasePropertyNames.

ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI


Severity	Low
Category	Correctness & logic bugs
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:285`

Description

Every other handler resolves its collaborators from the scoped IServiceProvider. HandleResolveRoles instead does new RoleMapper(sp.GetRequiredService<ISecurityRepository>()), bypassing DI. If RoleMapper ever gains a dependency, caching, or options, this hand-built instance silently diverges from the DI-registered one. It is also inconsistent with ManagementEndpoints, which resolves RoleMapper from DI.

Recommendation

Resolve RoleMapper via sp.GetRequiredService<RoleMapper>() like every other dependency.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: HandleResolveRoles did new RoleMapper(sp.GetRequiredService<ISecurityRepository>()), bypassing the AddScoped<RoleMapper>() DI registration. The hand-built RoleMapper lived only inside HandleResolveRoles, which is itself the dead-code dispatch path removed under finding 011 (the two-step ResolveRoles flow is retired). Resolving 011 by deleting the ResolveRolesCommand dispatch case and HandleResolveRoles handler also removes the only manually-constructed RoleMapper in the module, so the DI-bypass no longer exists. No remaining new RoleMapper in src/ScadaLink.ManagementService. Regression covered by ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError.

ManagementService-009 — Audit logging applied inconsistently across mutating handlers


Severity	Low — re-triaged from Medium; the claimed audit gap does not exist (see Description), leaving only an undocumented-convention issue.
Category	Code organization & conventions
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275`

Description

The design doc states "All mutating operations are audit logged." Some handlers call AuditAsync explicitly (HandleCreateInstance, HandleCreateSite, all repository-direct external-system/notification/security/area mutations), but the handlers that delegate to a domain service do not — HandleCreateTemplate/HandleUpdateTemplate/HandleDeleteTemplate, all template-member handlers (HandleAddAttribute ... HandleDeleteComposition), template-folder handlers, shared-script handlers, HandleDeployArtifacts, HandleDeployInstance, HandleEnableInstance/Disable/Delete, and the instance-binding/override handlers.

Re-triage (2026-05-16): the original finding claimed this "creates a real risk of silent audit gaps for template authoring and deployment operations." That claim was verified against the actual sources and is false. Every domain service the delegating handlers call — TemplateService, SharedScriptService, InstanceService, AreaService, SiteService, TemplateFolderService, DeploymentService, ArtifactDeploymentService — injects IAuditService and calls LogAsync on every mutation (grep confirms an _auditService.LogAsync call after each Create/Update/Delete in TemplateService.cs, DeploymentService.cs, ArtifactDeploymentService.cs, etc.). There is therefore no audit gap; if anything, adding explicit AuditAsync to a delegating handler would double-log. The genuine issue is purely organizational: the two-layer split (actor audits repo-direct mutations, services audit their own) was undocumented, which is what made it look risky. This is a Low-severity code-organization issue, not a Medium error-handling/resilience defect.

Recommendation

Document the chosen contract so the split cannot be misread as a gap. (The original alternative — moving all auditing into the actor — would require un-auditing eight services and is not warranted given they already audit correctly.)

Resolution

Resolved 2026-05-16 (commit pending). Re-triaged to Low / Code organization after verifying all eight delegated-to services audit internally — no audit gap exists. Documented the two-layer audit contract in an XML <remarks> block on ManagementActor.AuditAsync: repository-direct mutations call AuditAsync; service-delegating handlers must not, because the services own auditing and a duplicate call would double-log. No behavioural change, so no new regression test; existing CreateInstanceCommand_WithDeploymentRole_ReturnsSuccess covers the explicit-audit path.

ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used


Severity	Low
Category	Design-document adherence
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16`

Description

ManagementServiceOptions.CommandTimeout is bound from configuration in ServiceCollectionExtensions, but no code reads it. The HTTP endpoint instead hard-codes AskTimeout = TimeSpan.FromSeconds(30). The design doc describes the options section as "Reserved for future configuration — e.g., command timeout overrides", yet a concrete CommandTimeout property already exists and is silently ignored, so an operator who sets it in appsettings.json gets no effect.

Recommendation

Either consume ManagementServiceOptions.CommandTimeout in ManagementEndpoints.HandleRequest (inject IOptions<ManagementServiceOptions>), or remove the property until it is wired up so configuration cannot be set with no effect.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: ManagementEndpoints hard-coded AskTimeout = TimeSpan.FromSeconds(30) and never read ManagementServiceOptions.CommandTimeout. HandleRequest now resolves IOptions<ManagementServiceOptions> from context.RequestServices and computes the Ask timeout via a new ManagementEndpoints.ResolveAskTimeout helper, which returns the configured CommandTimeout when strictly positive and otherwise falls back to the 30s default (guarding against a misconfigured zero/negative value that would fail every call). Regression tests: ResolveAskTimeout_UsesConfiguredCommandTimeout, ResolveAskTimeout_WithNullOptions_FallsBackToDefault, ResolveAskTimeout_WithNonPositiveTimeout_FallsBackToDefault.

ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code


Severity	Low
Category	Correctness & logic bugs
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283`

Description

The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow (ResolveRoles + actual command) into a single HTTP round-trip", and indeed ManagementEndpoints performs LDAP auth and role resolution itself before dispatching. The ResolveRolesCommand case in DispatchCommand is therefore unreachable from the HTTP path. It remains reachable only via a raw ClusterClient sender, but a caller able to send ResolveRolesCommand could enumerate role mappings for arbitrary LDAP groups with no role requirement (GetRequiredRole returns null for it) — a minor information-disclosure surface for a path the design says no longer exists.

Recommendation

If the two-step flow is genuinely retired, remove ResolveRolesCommand, its handler, and the class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping data unauthenticated is intended.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed dead path: a repository-wide search found no ResolveRolesCommand sender outside ManagementActor itself — the CLI and HTTP endpoint perform LDAP auth + role resolution inline. Removed the ResolveRolesCommand dispatch case and the HandleResolveRoles handler from ManagementActor; a stray ClusterClient sender now falls through to the NotSupportedException default and gets a uniform ManagementError (closing the unauthenticated role-mapping enumeration surface, since GetRequiredRole returned null for it). A code comment at the former dispatch site documents the intentional omission. Note: the ResolveRolesCommand record itself lives in src/ScadaLink.Commons/Messages/Management/SecurityCommands.cs and was left in place — that file is outside this module's permitted edit scope; deleting the orphan record should be done as a Commons-module follow-up. With the handler removed it is now an inert, registry-only type with no behaviour. Regression test: ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError.

ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload


Severity	Low
Category	Akka.NET conventions
Status	Deferred
Location	`src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132`

Description

ManagementEnvelope.Command is typed object, so the actor relies on a large open-ended switch with a NotSupportedException default for unknown types. While the individual command records are immutable, object defeats compile-time exhaustiveness — adding a new command record produces no compiler signal that DispatchCommand (and GetRequiredRole) need updating, and a typo or unregistered command surfaces only as a runtime exception. The message contract is also harder to evolve safely under the additive-only rule.

Recommendation

Introduce a marker interface (e.g. IManagementCommand) implemented by every command record and type the envelope payload as that interface. This documents the contract, lets analyzers flag unhandled cases, and keeps ManagementCommandRegistry's reflection scan precise.

Resolution

Deferred 2026-05-16. Finding verified as genuine: ManagementEnvelope.Command is typed object and the recommended IManagementCommand marker-interface fix is sound. However, the fix cannot be implemented within the ManagementService module: both ManagementEnvelope and all ~50 *Command records live in src/ScadaLink.Commons/Messages/Management/ (17 files), which is outside this work item's permitted edit scope (src/ScadaLink.ManagementService/**, its tests, and this findings file only). Adding the marker interface, retyping the envelope, and having ManagementCommandRegistry constrain its reflection scan to IManagementCommand implementers is a cohesive Commons-module change and must be done there — also so the Commons message-contract additive-only evolution rules are respected. Deferred to a Commons-module work item; no ManagementService-local change is appropriate.

ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub


Severity	Medium
Category	Testing coverage
Status	Resolved
Location	`tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1`

Description

ManagementActorTests covers role-based authorization, success/error mapping, and correlation IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement — EnforceSiteScope/EnforceSiteScopeForInstance and SiteScopeViolationException -> Unauthorized mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b) ManagementEndpoints — Basic Auth decoding, malformed-header handling, LDAP/role resolution, command deserialization, and HTTP status mapping have zero coverage; (c) DebugStreamHub authentication, subscribe/unsubscribe lifecycle, and ManagementCommandRegistry.Resolve are untested. The Envelope test helper always passes Array.Empty<string>() for permitted sites, so no test ever exercises a site-scoped user.

Recommendation

Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope targets for instance and site operations, asserting ManagementUnauthorized on violations. Add WebApplicationFactory-based tests for ManagementEndpoints covering auth failures, malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.

Resolution

Resolved 2026-05-16 (commit pending). The site-scope and DebugStreamHub coverage gaps were closed by the resolution of findings 001/002/003 (the ScopedEnvelope helper plus the *_OutOfScopeForSiteScopedUser_ReturnsUnauthorized tests and DebugStreamHubTests). The remaining HTTP-endpoint gap is now covered by a new ManagementEndpointsTests.cs exercising ManagementEndpoints.ParseCommand — command deserialization, malformed JSON, missing command field, and unknown commands. Full WebApplicationFactory auth-flow tests were deliberately not added: ManagementEndpoints depends on LdapAuthService and live LDAP infrastructure, so the testable command-parsing/dispatch logic was extracted into the pure ParseCommand helper and covered instead. Tests: ParseCommand_* (5), SerializeResult_* (2), UnknownCommandType_FaultMappedToManagementError, plus the pre-existing site-scope and DebugStreamHub suites. dotnet test -> 48 passed.

ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:306`, `:1174`

Description

QueryDeploymentsCommand is gated to the Deployment role by GetRequiredRole (ManagementActor.cs:170–:177), and the design document's Authorization section states "Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments among the Deployment-role operations. HandleQueryDeployments makes no call to EnforceSiteScope / EnforceSiteScopeForInstance / EnforceSiteScopeForIdentifier: with no InstanceId it returns IDeploymentManagerRepository.GetAllDeploymentRecordsAsync() (every deployment record across all sites), and with an InstanceId it returns that instance's deployment history with no check that the instance's site is within the caller's permitted set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment records for instances at site B — instance IDs, DeployedBy (operator usernames), timestamps, deployment status, and ErrorMessage content — by issuing QueryDeployments with or without an out-of-scope InstanceId. This is the same authorization-bypass class as the resolved findings 001/002, on a handler that sweep did not cover; it is DispatchCommand's only Deployment-role handler with no scope enforcement.

Recommendation

Thread AuthenticatedUser into HandleQueryDeployments (the dispatch case at line 306 already has user in scope). When cmd.InstanceId is supplied, call EnforceSiteScopeForInstance before querying. When it is not supplied, filter the returned DeploymentRecord list to the caller's permitted sites — resolve each record's instance to its SiteId (or join through a site-aware repository query) and drop records for sites outside PermittedSiteIds, mirroring the HandleListInstances / HandleListSites filter pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope instances.

Resolution

Resolved 2026-05-17 (commit pending). Re-triage: the finding understated the gap — it claimed QueryDeploymentsCommand was already "gated to the Deployment role by GetRequiredRole (lines 170–177)". Verified against the source: QueryDeploymentsCommand appeared nowhere in GetRequiredRole, so it required no role at all — any authenticated user could read every deployment record system-wide. Fix applied both gates: added QueryDeploymentsCommand to the Deployment-role group in GetRequiredRole, and threaded AuthenticatedUser into HandleQueryDeployments — the InstanceId branch now calls EnforceSiteScopeForInstance; the unfiltered branch resolves each record's instance to its SiteId (cached) and drops records outside PermittedSiteIds, mirroring HandleListInstances. Regression tests: QueryDeployments_WithDesignRole_ReturnsUnauthorized, QueryDeployments_FilteredByOutOfScopeInstance_ReturnsUnauthorized, QueryDeployments_FilteredByInScopeInstance_ReturnsRecords, QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords, QueryDeployments_UnfilteredForAdminUser_ReturnsAllRecords.

ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically


Severity	Medium
Category	Error handling & resilience
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:647`–`:659`

Description

HandleSetInstanceOverrides iterates cmd.Overrides and calls InstanceService.SetAttributeOverrideAsync once per attribute, throwing InvalidOperationException on the first result.IsSuccess == false. Each SetAttributeOverrideAsync call persists independently, so if the command supplies five overrides and the third fails (e.g. an unknown attribute name or a validation error), the first two overrides are already committed to the configuration database while the caller receives a ManagementError. The instance is left partially mutated in a state the operator neither sees nor requested, and the per-instance operation lock referenced in the design's deployment decisions does not protect against this because the partial writes are committed before the throw. A retry of the same command then re-applies the already-applied overrides.

Recommendation

Make the multi-override mutation all-or-nothing: either validate every requested override up front before applying any, or apply all overrides within a single transaction / unit-of-work so a mid-batch failure rolls back the earlier writes. If InstanceService cannot offer a batch method, at minimum document the partial-application behaviour on SetInstanceOverridesCommand and have the handler report which overrides were applied before the failure so the caller can reconcile.

Resolution

Resolved 2026-05-17 (commit pending). Confirmed: each SetAttributeOverrideAsync call commits independently, so a mid-batch failure left earlier overrides persisted. HandleSetInstanceOverrides now validates every requested attribute up front against the instance's template (exists, not locked) and only begins applying once the whole batch is known valid — eliminating the realistic partial-mutation failure modes (unknown / locked attribute). InstanceService is outside this module's edit scope and offers no batch/ transactional method, so a genuine database fault mid-apply remains theoretically possible; this residual is documented in a code comment on the handler. Regression tests: SetInstanceOverrides_WithOneInvalidAttribute_PersistsNoOverrides, SetInstanceOverrides_AllValid_PersistsAllOverrides.

ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers


Severity	Low
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:121`–`:131`

Description

MapFault maps any non-SiteScopeViolationException fault to new ManagementError(correlationId, cause.Message, "COMMAND_FAILED"), and ManagementEndpoints.HandleRequest returns that Error string directly in the HTTP 400 body. For handler-thrown InvalidOperationExceptions carrying a curated result.Error message this is intended and safe. But the same path also surfaces the raw .Message of unanticipated exceptions — a SqlException (which can include server/database names and constraint details), a DbUpdateException, an ArgumentException from Enum.Parse on a malformed DataType/TriggerType value, or a NullReferenceException — straight to the external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception text is already logged server-side with full context, so the client does not need the raw message.

Recommendation

Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated exception type (e.g. ManagementCommandException) for messages that are safe to surface, and in MapFault return that message for the known type while returning a generic "An internal error occurred (CorrelationId=...)" string for everything else — the operator can still correlate to the server log via the correlation ID.

Resolution

Resolved 2026-05-17 (commit pending). Added a ManagementCommandException type for curated, caller-safe failures and converted every curated throw in ManagementActor (the 34 result.Error rethrows and 15 "not found" / delete-blocked messages) to it. MapFault now returns the message verbatim only for ManagementCommandException; any other fault (DB errors, Enum.Parse ArgumentException, NullReferenceException, the unknown-command NotSupportedException, etc.) yields a generic "An internal error occurred (CorrelationId=...)" string while the full exception is still logged server-side. Regression tests: UnexpectedFault_ReturnsGenericMessage_NotRawExceptionText, CuratedHandlerFailure_SurfacesTheCuratedMessage; the two pre-existing *_WhenRepoThrows_* tests were updated to assert the raw repository message is no longer leaked.

ManagementService-017 — QueryDeploymentsCommand has no test coverage


Severity	Low
Category	Testing coverage
Status	Resolved
Location	`tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1`

Description

QueryDeploymentsCommand / HandleQueryDeployments is exercised by no test in ManagementActorTests. There is no test that it requires the Deployment role, no test of the InstanceId-filtered versus unfiltered branches, and — because the handler performs no site-scope enforcement at all — no test that would have caught finding 014. The deployment query is one of the operations the design's Authorization section calls out for site scoping, yet it is the only Deployment-role command with neither an authorization test nor a site-scope test.

Recommendation

Add tests for QueryDeploymentsCommand: a role test (Design/no-role caller → ManagementUnauthorized), branch coverage for the InstanceId-filtered and unfiltered repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped Deployment user against in-scope and out-of-scope deployment records.

Resolution

Resolved 2026-05-17 (commit pending). Added seven QueryDeployments_* tests to ManagementActorTests: role gate (_WithDesignRole_ReturnsUnauthorized), the InstanceId-filtered and unfiltered branches (_FilteredByInstanceId_ReturnsInstanceRecords, _UnfilteredWithDeploymentRole_ReturnsAllRecords), and site-scope coverage for a site-scoped Deployment user and an Admin user, in- and out-of-scope (_FilteredByOutOfScopeInstance_ReturnsUnauthorized, _FilteredByInScopeInstance_ReturnsRecords, _UnfilteredForSiteScopedUser_DropsOutOfScopeRecords, _UnfilteredForAdminUser_ReturnsAllRecords).

ManagementService-018 — QueryAuditLogCommand has no role gate


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:153`–`:207`, `:336`, `:1302`

Description

QueryAuditLogCommand is dispatched at line 336 to HandleQueryAuditLog, which calls ICentralUiRepository.GetAuditLogEntriesAsync(...) with no role check, no site-scope check, and no actor filter. GetRequiredRole (lines 153–207) does not list QueryAuditLogCommand, so it falls through to the _ => null case — i.e. "read-only queries — any authenticated user". The parallel /api/audit/query endpoint in AuditEndpoints.HandleQuery correctly enforces AuthorizationPolicies.OperationalAuditRoles ({ "Admin", "Audit", "AuditReadOnly" }), so a CLI authenticated as a user with only the Deployment role — or no roles at all — is rejected at /api/audit/query but can read the same audit log table through /management by sending QueryAuditLogCommand. The two surfaces enforce different permissions on the same data; the older ManagementActor-routed path is the looser one. The audit log records every script-trust- boundary action and is sensitive operationally — it should not be readable by a default authenticated user.

This is the same authorization-bypass class as findings 001/002/014 and was missed in that sweep because QueryAuditLogCommand (legacy Action/EntityType filter) is a separate command from the new keyset-paged IAuditLogRepository.QueryAsync path the /api/audit/query endpoint uses.

Recommendation

Add QueryAuditLogCommand to GetRequiredRole. The natural fit is a new "OperationalAudit"-style role group — but GetRequiredRole returns a single string and the project's existing role gates do too (Admin/Design/Deployment). Two equally defensible options:

Add QueryAuditLogCommand to the Admin-required group — strict, mirrors that AuditExportRoles includes Admin. The CLI's CLI-017/018 audit work uses /api/audit/query, so QueryAuditLogCommand may be effectively orphaned anyway.
Extend GetRequiredRole to return a role set and add an AuditRoles group equal to AuthorizationPolicies.OperationalAuditRoles, so the two surfaces converge.

Recommended: option 1 plus a deprecation comment on QueryAuditLogCommand pointing at /api/audit/query — the legacy command's filter shape is a subset of the new endpoint's, so the ManagementActor route is redundant. Add a regression test asserting that a no-role / Deployment-only caller gets ManagementUnauthorized for QueryAuditLogCommand.

Resolution

Resolved 2026-05-28 (commit pending) per recommendation option 1. QueryAuditLogCommand was added to the Admin-required group in GetRequiredRole, with an inline comment documenting the deliberate strictness vs. the keyset-paged /api/audit/query (OperationalAuditRoles) and pointing new audit consumers at the REST endpoint. The CentralUI ConfigurationAuditLog page reads via ICentralUiRepository directly (not through this command), so the gate tightening does not break any UI flow. Two regression tests pin the new behaviour: QueryAuditLogCommand_WithNoRoles_ReturnsUnauthorized and QueryAuditLogCommand_WithDeploymentRole_ReturnsUnauthorized — both fail on the pre-fix code (the command fell through to "any authenticated user") and pass after.

ManagementService-019 — AuditEndpoints builds PermittedSiteIds but never enforces them


Severity	Medium
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/AuditEndpoints.cs:358`–`:368`, `:397`–`:437`

Description

AuditEndpoints.AuthenticateAsync resolves the caller's roles AND PermittedSiteIds and wraps them in an AuthenticatedUser (lines 358–366), but the returned AuthenticatedUser is then only used for the HasAnyRole(...) role check on lines 114 and 163 — its PermittedSiteIds are never read. ParseFilter (line 397) accepts the caller-supplied sourceSiteId=... query string verbatim and passes it straight into the IAuditLogRepository.QueryAsync filter. A user whose Audit (or AuditReadOnly) role mapping carries scope rules — e.g. AuditReadOnly scoped to "plant-a" — can still ask for sourceSiteId=plant-b and get back rows for plant-b.

Today this gap is partially benign because the design treats Audit/AuditReadOnly as non-site-scoped roles (Component-AuditLog.md does not list site scoping for the audit permissions, and the LDAP role mapping UI does not currently surface site scope rules for those roles). But (a) the RoleMapper will silently honour scope rules attached to any role, including Audit, so an operator who does configure them gets a UI that says "scoped" and an endpoint that ignores the scope — a contract violation; (b) the Admin role's PermittedSiteIds are always empty (system-wide), so enforcing for the other roles is cheap. The asymmetry with the /management endpoint — which routes every site-targeted command through EnforceSiteScope — is also a maintenance hazard.

Recommendation

Decide explicitly whether the audit endpoints honour site scope. Two options:

Honour scope — in HandleQuery / HandleExport, after the role check, intersect the caller-supplied filter.SourceSiteIds with user.PermittedSiteIds. If the caller supplied no sourceSiteId and PermittedSiteIds is non-empty, restrict to PermittedSiteIds. If the intersection is empty, return an empty page (or a 403 if the caller explicitly asked for an out-of-scope site).
Document the intentional bypass — drop the PermittedSiteIds field from the AuthenticatedUser constructed in AuthenticateAsync (or comment it as "ignored — audit roles are not site-scoped") so the code stops carrying a value it does not read, and add an XML doc note on the endpoint class that audit roles are always system-wide by design.

Recommended: option 1, mirroring the ManagementActor pattern — same security posture across both surfaces. Add a regression test that a site-scoped AuditReadOnly user filtering on an out-of-scope site gets a 403 (or an empty page).

Resolution

Resolved 2026-05-28 (commit pending) per recommendation option 1. Added a public helper AuditEndpoints.ApplySiteScope(AuditLogQueryFilter, AuthenticatedUser) that returns the restricted filter (or null when the caller explicitly asks for an out-of-scope site). Three cases:

Empty PermittedSiteIds (Admin or any unscoped role) → filter returned unchanged.
Scoped user with empty caller filter → SourceSiteIds set to the permitted set.
Scoped user with explicit SourceSiteIds → intersected with the permitted set; empty intersection returns null so HandleQuery / HandleExport emit a 403 rather than silently producing an empty page.

Both HandleQuery and HandleExport now call the helper after the role check and short-circuit to Forbidden("OperationalAudit"|"AuditExport") on a null result. Audit roles remain non-site-scoped by design (the design doc unchanged), but the helper honours scope rules if an operator attaches them via the LDAP-mapping UI, matching the existing ManagementActor pattern. Regression tests added in AuditEndpointsTests.ApplySiteScope_* (5 tests): system-wide unchanged, empty-caller-filter restricted, in-scope kept verbatim, out-of-scope returns null, mixed-set intersected.

ManagementService-020 — UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim


Severity	Medium
Category	Security
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:1136`–`:1153`

Resolution (2026-05-28): Added SmtpConfigPublicShape projection that returns every non-secret field plus a HasCredentials bool — never the Credentials field itself. Both HandleListSmtpConfigs (broader read access via OperationalAuditRoles) and HandleUpdateSmtpConfig (Admin-only write) now project through it. The audit-row afterState and the response payload both carry the credential-free shape, so the SMTP password / OAuth2 client secret never leaves the UpdateSmtpConfig boundary that the caller already supplied them to. ManagementService 100/100 tests still pass. Follow-up to tag SmtpConfiguration.Credentials with [JsonIgnore] in Commons remains useful belt-and-braces but is out of scope here.

Description

HandleUpdateSmtpConfig reads the existing SmtpConfiguration entity, applies the incoming command, and then (a) passes the full config object as the afterState to AuditAsync (line 1151) — meaning the SMTP credential string is persisted in the audit log — and (b) returns the full config to the caller (line 1152), which is serialized via SerializeResult and sent back over HTTP. SmtpConfiguration.Credentials carries the SMTP-Auth password (for Basic) or the OAuth2 client secret (for OAuth2ClientCredentials); SmtpConfiguration has no [JsonIgnore] on this field and SerializeResult's JsonSerializerOptions does not exclude it. The pattern parallels what ConfigurationDatabase-012 fixed for inbound API keys: a credential artifact must not be echoed back through every read/audit path.

The credential is supplied by the operator in UpdateSmtpConfigCommand.Credentials, so the caller already has it. But (1) anyone with read access to the audit log (OperationalAuditRoles) can now retrieve every SMTP credential change verbatim — a strictly larger blast radius than Admin-only UpdateSmtpConfig. (2) The serialized config echo means the credential moves over the wire in the response even though the caller has no need for it. (3) Any future read path that returns SmtpConfiguration — ListSmtpConfigsCommand already does at line 1130 — will leak the stored credential too.

Recommendation

Three changes, in order of priority:

In HandleUpdateSmtpConfig and HandleListSmtpConfigs, project to a credential-free shape before returning — e.g. new { config.Id, config.Host, config.Port, config.AuthType, config.FromAddress, config.TlsMode }. Match the HandleListApiKeys pattern.
In AuditAsync for the SMTP path, pass a credential-free afterState (the same anonymous shape). The fact that something changed is auditable; the secret value is not.
Tag SmtpConfiguration.Credentials with [JsonIgnore] in Commons (out-of-scope edit for this module, but worth a follow-up). Alternatively, configure ResultSerializerOptions with a property name policy that skips a known set of credential field names — but a per-entity projection is cleaner.

Add regression tests: UpdateSmtpConfig_DoesNotEchoCredentialsInResponse and UpdateSmtpConfig_DoesNotPersistCredentialsInAuditLog.

ManagementService-021 — Transport bundle handlers have zero test coverage


Severity	Medium
Category	Testing coverage
Status	Resolved
Location	`tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1`; `src/ScadaLink.ManagementService/ManagementActor.cs:1717`–`:1897`

Resolution (2026-05-28): Added six ManagementActorTests cases covering the load-bearing bundle behaviours. Role gating: ExportBundleCommand_WithAdminRole_ReturnsUnauthorized (Export needs Design), PreviewBundleCommand_WithDesignRole_ReturnsUnauthorized, and ImportBundleCommand_WithDesignRole_ReturnsUnauthorized (Preview/Import need Admin). Name resolution: ExportBundleCommand_WithUnknownTemplateName_ReturnsManagementError proves the ResolveIds ManagementCommandException surfaces verbatim with the missing entity type + name. Handler logic: ImportBundleCommand_WithBlockerRow_AbortsBeforeApply seeds a ConflictKind.Blocker preview and asserts IBundleImporter.ApplyAsync is never called and the error names the blocker; ImportBundleCommand_DuplicatePreviewItems_DedupePerEntityTypeAndName seeds three rows for the same (Template, "Dup") key (Identical, Modified, Identical) and asserts only one resolution reaches ApplyAsync with last-write-wins (Skip, overriding the prior Modified/Overwrite). All bundle tests use substituted IBundleExporter/IBundleImporter via a new AddBundleSubstitutes() helper plus stub GetAll*Async repository returns. 106/106 ManagementService.Tests pass.

Description

The three Transport (#24) bundle handlers — HandleExportBundle, HandlePreviewBundle, HandleImportBundle (~180 lines of handler logic at the bottom of ManagementActor.cs) — have no tests in ManagementActorTests. Specifically untested:

Role gating. ExportBundleCommand requires Design; PreviewBundleCommand and ImportBundleCommand require Admin. No test asserts that the wrong role gets ManagementUnauthorized. CLI-017 / CLI-018 just landed around bundle plumbing — a future refactor that moves these commands between role groups in GetRequiredRole would silently regress the gate.
Name resolution in HandleExportBundle. The inner ResolveIds<T> helper raises ManagementCommandException for unknown names. The "all entity types" branch (cmd.All == true) and the "missing name" branch are both untested.
HandleImportBundle blocker rejection. The handler aborts before ApplyAsync when any ConflictKind.Blocker row is present; the produced error message is curated and surfaced to the caller, but no test asserts the abort path or that the importer's ApplyAsync was not called.
Resolution dedupe. HandleImportBundle dedupes (EntityType, Name) keys last-write-wins — the dedupe is critical (CLI-014 was about it on the CLI side) but has no actor-side regression test.
DecodeBundle failure modes (empty/non-base64 input) — both branches return curated ManagementCommandException but neither is exercised.
ParseConflictPolicy for "skip", "overwrite", "rename", and the invalid- value branch — all untested.

Given the size and reach of the bundle path (cross-cutting central configuration import), this gap is materially larger than usual for new handler code.

Recommendation

Add an ImportBundleHandlerTests suite covering:

role gating for all three commands (Design/Admin mismatch -> ManagementUnauthorized),
ExportBundleCommand(All: true) happy-path,
ExportBundleCommand with an unknown name -> ManagementError,
ImportBundleCommand with a Blocker row -> ManagementError and ApplyAsync not called,
ImportBundleCommand with duplicate preview items -> dedupe to one resolution per (type, name),
DecodeBundle empty/invalid base64,
ParseConflictPolicy all four branches.

Use NSubstitute for IBundleImporter / IBundleExporter (no need for a real bundle in the actor tests; the bundle round-trip belongs in Transport tests).

ManagementService-022 — Design doc is stale on Transport bundle commands, /api/audit/* endpoints, and CommandTimeout


Severity	Low
Category	Design-document adherence
Status	Resolved
Location	`docs/requirements/Component-ManagementService.md:77`–`:175`, `:205`–`:209`

Resolution (2026-05-28): Updated Component-ManagementService.md — added a "Transport (Bundle Import / Export)" entry under "Message Groups" listing ExportBundle (Design), PreviewBundle/ImportBundle (Admin); inserted a new "HTTP Audit API" section describing GET /api/audit/query (OperationalAudit) and GET /api/audit/export (AuditExport) with the site-scope intersection rule; and rewrote the Configuration table to document the now-wired CommandTimeout (30 s default, non-positive falls back). Legacy QueryAuditLog is now annotated as Admin-gated and superseded by the REST surface for #23.

Description

Component-ManagementService.md does not mention three pieces of shipped functionality:

Transport (#24) bundle commands. ExportBundleCommand, PreviewBundleCommand, and ImportBundleCommand are dispatched at ManagementActor.cs:350–:352 and role-gated in GetRequiredRole (Design for Export; Admin for Preview/Import). The design doc's "Message Groups" section enumerates Templates, Instances, Sites, Data Connections, Deployments, External Systems, Notifications, Security, Audit Log, Shared Scripts, Database Connections, Inbound API Methods, Health, and Remote Queries — but has no "Transport" / "Bundles" group. The CLI now offers bundle export/preview/import (per the recent CLI-017/018 work) and points at these commands.
/api/audit/* endpoints. The doc's "HTTP Management API" section (line 52) describes only POST /management. AuditEndpoints.MapAuditAPI() adds GET /api/audit/query and GET /api/audit/export with their own auth-and-role path mirroring ManagementEndpoints (intentionally — see the AuditEndpoints XML docs), but the design doc gives no signal that the module exposes more than one route group, no per-endpoint role mapping table, and no mention that the response shape differs (keyset cursor vs. opaque page).
CommandTimeout. Line 209 still says "Reserved for future configuration — e.g., command timeout overrides", but ManagementService-010 wired the option through ResolveAskTimeout. The doc is stale.

Recommendation

Update Component-ManagementService.md:

Add a "Transport" entry to "Message Groups" listing ExportBundle, PreviewBundle, ImportBundle with their per-command roles. Cross-reference Component-Transport.md.
Add an "Audit Log HTTP API" subsection under "HTTP Management API" describing GET /api/audit/query (keyset cursor, OperationalAuditRoles) and GET /api/audit/export (csv/jsonl streaming, AuditExportRoles, parquet 501). Note the deliberate divergence in the source-site query-string key (sourceSiteId vs CentralUI's site).
In the "Configuration" table, replace "Reserved for future configuration" with the actual CommandTimeout semantics: "Max time the HTTP endpoint will Ask the ManagementActor before returning HTTP 504; falls back to 30 s when unset or non-positive."

ManagementService-023 — HandleQueryDeployments unfiltered branch is N+1 on instance lookup


Severity	Low
Category	Performance & resource management
Status	Resolved
Location	`src/ScadaLink.ManagementService/ManagementActor.cs:1276`–`:1295`

Resolution (2026-05-28): Swapped the per-InstanceId GetInstanceByIdAsync lookup loop for a single templateRepo.GetAllInstancesAsync() bulk fetch, projected into a Dictionary<int, int?> keyed by Instance.Id. The unfiltered branch now hits the configuration database exactly twice (deployment records + instances) regardless of fleet size. Existing test QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords was updated to mock the bulk path and to assert GetInstanceByIdAsync is no longer used on the unfiltered branch; new regression test QueryDeployments_UnfilteredForSiteScopedUser_UsesBulkInstanceLoad_NotPerRecordLookup pins GetAllInstancesAsync is invoked exactly once even with duplicate-instance records.

Description

The site-scoped unfiltered branch of HandleQueryDeployments (added under ManagementService-014) reads every DeploymentRecord via GetAllDeploymentRecordsAsync, then for each unique record.InstanceId calls ITemplateEngineRepository.GetInstanceByIdAsync to resolve the instance's SiteId. The handler caches results in instanceSiteCache so each instance is loaded at most once per call, but for a fleet with N distinct instances having deployment history, the handler still issues N round-trips to the configuration database to authorize a single query. With a large deployment history the cumulative DB hit can be material; it also runs every time a site-scoped user opens the deployments page.

This is acceptable in steady state today (sites tend to have small fleets and few deployments) but is a textbook N+1 read pattern, and on a busy day for a site-scoped operator the cost will dominate the request. Admin and system-wide Deployment users correctly skip the loop (they hit only GetAllDeploymentRecordsAsync).

Recommendation

Add a batch-resolve method to ITemplateEngineRepository — e.g. Task<IDictionary<int, int>> GetInstanceSiteIdsAsync(IEnumerable<int> instanceIds) — backed by a single EF query (Instances.Where(i => instanceIds.Contains(i.Id)).Select(i => new { i.Id, i.SiteId })). HandleQueryDeployments would then issue exactly two queries on the unfiltered branch (records + sites) regardless of fleet size. The change is additive to ITemplateEngineRepository and out-of-module for the actual implementation, but the handler change is local; a quick interim alternative is to project deployment records to include the instance's SiteId at the repo level, which removes the second query entirely.

Defer until a noticeable hot path emerges, but track it: this is the only N+1 in ManagementActor once 002 / 014 are folded in.

66 KiB Raw Blame History Unescape Escape

Code Review — ManagementService

Summary

Re-review 2026-05-17 (commit 39d737e)

Re-review 2026-05-28 (commit 1eb6e97)

Checklist coverage

Findings

ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement

ManagementService-002 — Single-entity query handlers leak data across site scope

ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization

ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo

ManagementService-005 — ManagementActor declares no supervision strategy

ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint

ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities

ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI

ManagementService-009 — Audit logging applied inconsistently across mutating handlers

ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used

ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code

ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload

ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub

ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement

ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically

ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers

ManagementService-017 — QueryDeploymentsCommand has no test coverage

ManagementService-018 — QueryAuditLogCommand has no role gate

ManagementService-019 — AuditEndpoints builds PermittedSiteIds but never enforces them

ManagementService-020 — UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim

ManagementService-021 — Transport bundle handlers have zero test coverage

ManagementService-022 — Design doc is stale on Transport bundle commands, /api/audit/* endpoints, and CommandTimeout

ManagementService-023 — HandleQueryDeployments unfiltered branch is N+1 on instance lookup

66 KiB

Raw Blame History

Re-review 2026-05-17 (commit `39d737e`)

Re-review 2026-05-28 (commit `1eb6e97`)