# Code Review — ManagementService

| Field | Value |
|-------|-------|
| Module | `src/ZB.MOM.WW.ScadaBridge.ManagementService` |
| Design doc | `docs/requirements/Component-ManagementService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 0 (1 Deferred — see ManagementService-012) |

## Summary

The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
broadly correct and well tested. However, the review surfaced a significant **security
theme**: site-scope enforcement, which the design document requires for instance- and
site-targeted Deployment operations, is applied inconsistently — several query handlers and
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
instances never disposed) and dead/unused configuration. None of the findings are
crash-class, but the site-scope gaps are High severity because they are a real
authorization bypass with no workaround.

#### Re-review 2026-05-17 (commit `39d737e`)

All thirteen prior findings remain correctly closed; the source under
`src/ZB.MOM.WW.ScadaBridge.ManagementService` is byte-identical between the previously reviewed state
and `39d737e` (the resolution commits of findings 001–013 are folded into the history at or
before `39d737e`). ManagementService-012 was re-checked and its **Deferred** status still
holds: `ManagementEnvelope.Command` is still typed `object`, and the marker-interface fix
still belongs in the Commons module, outside this module's edit scope — nothing has changed
to make it actionable here. This re-review re-ran the full 10-category checklist against the
current sources and surfaced **four new findings**. The dominant theme is the same
site-scope authorization gap that findings 001/002 closed: `HandleQueryDeployments`
(`QueryDeploymentsCommand`) was overlooked by that sweep and still performs no site-scope
enforcement, letting a site-scoped Deployment user read deployment history for any site
(014, High). The remaining three are lower severity: a non-atomic multi-override mutation
that can leave an instance partially modified after an error (015, Medium), raw exception
messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and
`QueryDeploymentsCommand` having no test coverage at all (017, Low).

#### Re-review 2026-05-28 (commit `1eb6e97`)

All seventeen prior findings remain correctly closed; ManagementService-012 is still the
only Deferred entry (marker-interface on `ManagementEnvelope.Command` still belongs in the
Commons module). The module has grown substantially since the last review (`+1997 lines`):
the Transport (#24) bundle commands (`ExportBundle`/`PreviewBundle`/`ImportBundle`) have
been added to `ManagementActor`, and a new `AuditEndpoints.cs` (`/api/audit/query` and
`/api/audit/export`) ships alongside the existing `/management` endpoint. This re-review
re-ran the full 10-category checklist and surfaced **six new findings**. The dominant
theme is the same authorization gap that findings 001/002/003/014 closed for the
ManagementActor, now resurfacing in the new surfaces:
**QueryAuditLogCommand has no role gate at all** (018, High) — any authenticated user can
read the configuration audit log via `/management`, even though the parallel
`/api/audit/query` requires `OperationalAuditRoles`. The new `/api/audit/{query,export}`
endpoints build an `AuthenticatedUser` with `PermittedSiteIds` but never enforce site scope
(019, Medium) — although audit roles are not site-scoped by design, the user-supplied
`sourceSiteId` filter is honoured verbatim. `HandleUpdateSmtpConfig` returns the full
SmtpConfiguration entity (including the `Credentials` field, which can carry SMTP passwords
/ OAuth2 client secrets) in the response and audit row (020, Medium). The Transport (#24)
bundle commands have zero test coverage in `ManagementActorTests` (021, Medium) — neither
role gating nor success/error paths. The `Component-ManagementService.md` design doc is
stale on three fronts: it does not mention Transport bundle commands, the `/api/audit/*`
endpoints, or the now-wired `CommandTimeout` option (022, Low). Finally,
`HandleQueryDeployments` issues one `GetInstanceByIdAsync` per unique instance ID when
filtering for a site-scoped user — an N+1 read pattern on the unfiltered branch (023, Low).

## Checklist coverage

| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |

_Re-review (2026-05-28, `1eb6e97`):_

| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | + | `HandleImportBundle` correctly dedupes resolutions per (entity,name); `ParseDocument` still allocates a `JsonDocument.Parse("{}")` on the failure path but the caller's `using` disposes it. No new defects. |
| 2 | Akka.NET conventions | + | PipeTo dispatch from 004 is intact; supervision strategy from 005 is intact; `Sender` correctly captured to local before PipeTo. No new findings. |
| 3 | Concurrency & thread safety | + | Bundle handlers `await` cleanly; `BundleSession` is not cleaned up if `PreviewAsync`/`ApplyAsync` throws, but that is an `IBundleImporter` contract concern outside this module. No new findings. |
| 4 | Error handling & resilience | + | `ManagementCommandException` from 016 is applied consistently across the new bundle handlers (curated `CryptographicException`/`ArgumentException` paths). No new findings. |
| 5 | Security | + | `QueryAuditLogCommand` has no role gate (018, High). New `/api/audit/*` endpoints build `PermittedSiteIds` but never enforce them (019, Medium). `HandleUpdateSmtpConfig` returns + audits `Credentials` verbatim (020, Medium). |
| 6 | Performance & resource management | + | `HandleQueryDeployments` unfiltered-with-scope branch is N+1 on instance lookups (023, Low). Request body up to 200 MB read into a single `string` in `HandleRequest` (acceptable per Transport bundle requirement). |
| 7 | Design-document adherence | + | `Component-ManagementService.md` is stale on Transport bundle commands, `/api/audit/*` endpoints, and the now-wired `CommandTimeout` (022, Low). |
| 8 | Code organization & conventions | + | `AuditEndpoints` duplicates the Basic Auth → LDAP → roles flow from `ManagementEndpoints` (~50 lines). Acknowledged in `AuditEndpoints` XML but worth tracking. No new finding raised. |
| 9 | Testing coverage | + | Transport bundle commands have zero `ManagementActorTests` coverage — neither role gating nor handler logic (021, Medium). |
| 10 | Documentation & comments | + | New `AuditEndpoints` XML doc is high quality. `Component-ManagementService.md` not updated for Transport/Audit endpoints (022 covers). |

## Findings

### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement

| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |

**Description**

The design document (`Component-ManagementService.md`, Authorization section) states that for
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
This is an authorization bypass with no workaround.

**Recommendation**

In each of these handlers resolve the target site and call site-scope enforcement before
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
`AuthenticatedUser` into these handlers, currently dropped).

**Resolution**

Resolved 2026-05-16 (commit `<pending>`). Threaded `AuthenticatedUser` into
`HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleRetryParkedMessage`,
`HandleDiscardParkedMessage`, and `HandleDebugSnapshot`; added an
`EnforceSiteScopeForIdentifier` helper that resolves the site by identifier and applies
`EnforceSiteScope`. `HandleDebugSnapshot` enforces against the already-loaded instance's
`SiteId`. Regression tests: `QueryEventLogs_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`QueryParkedMessages_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`RetryParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`DiscardParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`DebugSnapshot_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.

### ManagementService-002 — Single-entity query handlers leak data across site scope

| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |

**Description**

`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
or data connection by ID even though that site is excluded from their scope. The list
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
do enforce scope, confirming the omission elsewhere is unintentional.)

**Recommendation**

Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
(for data connections, scope by the connection's `SiteId`).

**Resolution**

Resolved 2026-05-16 (commit `<pending>`). `HandleGetInstance`, `HandleGetSite`,
`HandleGetDataConnection` now take `AuthenticatedUser` and call `EnforceSiteScope` against
the resolved entity's site ID (instance `SiteId`, site `Id`, data-connection `SiteId`);
`HandleListAreas` enforces against the requested `SiteId` before querying. Regression tests:
`GetInstance_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetInstance_InScopeForSiteScopedUser_ReturnsSuccess`,
`GetSite_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetSite_OutOfScopeForAdminUser_ReturnsSuccess`,
`ListAreas_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
`GetDataConnection_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.

### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization

| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/DebugStreamHub.cs:104` |

**Description**

`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
a stream without checking that the authenticated user is scoped to that instance's site. A
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
values, alarm states) of an instance belonging to a site outside their scope. This is the
streaming equivalent of finding 001/002.

**Recommendation**

Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
authenticated user's permitted-site set does not include it. The authenticated identity
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
`Context.Items`) so it is available to `SubscribeInstance`.

**Resolution**

Resolved 2026-05-16 (commit `<pending>`). `OnConnectedAsync` now persists the resolved
roles and `PermittedSiteIds` in `Context.Items`. `SubscribeInstance` resolves the
instance's site via `ITemplateEngineRepository` and rejects the subscription (sending
`OnStreamTerminated`) when the new pure `DebugStreamHub.IsInstanceAccessAllowed` check
fails. The check grants access for the Admin role or system-wide Deployment (empty
permitted set) and otherwise requires the instance's site in the permitted set. Regression
tests: `IsInstanceAccessAllowed_SiteScopedUser_OutOfScopeInstance_Denied`,
`IsInstanceAccessAllowed_SiteScopedUser_InScopeInstance_Allowed`,
`IsInstanceAccessAllowed_SystemWideDeployment_AnySiteAllowed`,
`IsInstanceAccessAllowed_AdminRole_BypassesSiteScope`,
`IsInstanceAccessAllowed_AdminRoleCheck_IsCaseInsensitive`.

### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo

| | |
|--|--|
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:61` |

**Description**

`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
conventions warn against — the canonical approach is to start the async work and `PipeTo`
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
bound; (c) is shielded from synchronous faults only because every path is inside the
try/catch — any future code path that throws synchronously before the `Task.Run` body would
escape it.

**Recommendation**

Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: `HandleEnvelope` ran every command via
`Task.Run` and replied from inside the continuation, contrary to the project's PipeTo
convention. Replaced it with a `ProcessCommand` method returning a `Task<object>` and
`PipeTo(sender, success, failure)`; faults are now mapped uniformly in a `MapFault` failure
continuation (`SiteScopeViolationException` -> `ManagementUnauthorized`, otherwise
`ManagementError`), which also unwraps `AggregateException`. Regression test:
`UnknownCommandType_FaultMappedToManagementError`. Existing success/error/unauthorized
mapping tests confirm behaviour is preserved.

### ManagementService-005 — ManagementActor declares no supervision strategy

| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:33` |

**Description**

The project conventions call for explicit supervision strategies (Resume for coordinator
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
spawns no children so the default strategy is rarely exercised, but an explicit strategy
should still be declared for clarity and to match the documented convention; it also matters
if children are added later (e.g. if finding 004 introduces worker actors).

**Recommendation**

Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
Resume-based strategy, consistent with other central coordinator actors.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: `ManagementActor` declared no
`SupervisorStrategy`. Added a `public static SupervisorStrategy CreateSupervisorStrategy()`
factory returning an unbounded `OneForOneStrategy` with a `Directive.Resume` decider, and a
`protected override SupervisorStrategy()` that delegates to it — matching the Resume-based
convention of `CentralCommunicationActor`/`SiteCommunicationActor`. The actor spawns no
children today, so this is a forward-looking correctness fix. Regression tests:
`CreateSupervisorStrategy_ReturnsOneForOneStrategy`,
`CreateSupervisorStrategy_ResumesOnArbitraryException`,
`CreateSupervisorStrategy_ResumesIndefinitely` (new `ManagementActorSupervisionTests.cs`).

### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint

| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementEndpoints.cs:83`, `:112` |

**Description**

`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
parses the request body into `doc` at line 83 and never disposes it, and line 112
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
churn under load.

**Recommendation**

Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
object, or restructure so the fallback path does not parse a throwaway document.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: the request `JsonDocument` was never
disposed and the empty-payload path allocated a second throwaway `JsonDocument`. Extracted
request parsing into a testable `ManagementEndpoints.ParseCommand` helper that wraps the
document in `using`; the missing-payload case now deserializes from the `"{}"` literal
string rather than parsing a throwaway document. Regression tests:
`ParseCommand_WithExplicitPayload_DeserializesIntoCommandType`,
`ParseCommand_WithMissingPayload_DeserializesParameterlessCommand`,
`ParseCommand_WithInvalidJson_ReturnsFailure`,
`ParseCommand_WithMissingCommandField_ReturnsFailure`,
`ParseCommand_WithUnknownCommand_ReturnsFailure`.

### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities

| | |
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:67`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementEndpoints.cs:113` |

**Description**

The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.

**Recommendation**

Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
explicit DTOs / projections rather than EF entities, or configure
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
correctly.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: the actor serialized results with
`Newtonsoft.Json` (not even a direct package reference) while the HTTP endpoint uses
`System.Text.Json`. Standardised the actor on `System.Text.Json` via a new
`ManagementActor.SerializeResult` helper using a shared `JsonSerializerOptions` with
`ReferenceHandler.IgnoreCycles` (cycle-safe for EF entity graphs) and camelCase naming
(matches the CLI's case-insensitive deserializer). Removed the `Newtonsoft.Json` import.
Regression tests: `SerializeResult_WithCyclicGraph_DoesNotThrow`,
`SerializeResult_UsesCamelCasePropertyNames`.

### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI

| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:285` |

**Description**

Every other handler resolves its collaborators from the scoped `IServiceProvider`.
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
instance silently diverges from the DI-registered one. It is also inconsistent with
`ManagementEndpoints`, which resolves `RoleMapper` from DI.

**Recommendation**

Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: `HandleResolveRoles` did
`new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`, bypassing the
`AddScoped<RoleMapper>()` DI registration. The hand-built `RoleMapper` lived only inside
`HandleResolveRoles`, which is itself the dead-code dispatch path removed under finding 011
(the two-step ResolveRoles flow is retired). Resolving 011 by deleting the
`ResolveRolesCommand` dispatch case and `HandleResolveRoles` handler also removes the only
manually-constructed `RoleMapper` in the module, so the DI-bypass no longer exists. No
remaining `new RoleMapper` in `src/ZB.MOM.WW.ScadaBridge.ManagementService`. Regression covered by
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.

### ManagementService-009 — Audit logging applied inconsistently across mutating handlers

| | |
|--|--|
| Severity | Low — re-triaged from Medium; the claimed audit gap does not exist (see Description), leaving only an undocumented-convention issue. |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |

**Description**

The design doc states "All mutating operations are audit logged." Some handlers call
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
external-system/notification/security/area mutations), but the handlers that delegate to a
domain service do **not** — `HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers.

**Re-triage (2026-05-16):** the original finding claimed this "creates a real risk of silent
audit gaps for template authoring and deployment operations." That claim was verified against
the actual sources and is **false**. Every domain service the delegating handlers call —
`TemplateService`, `SharedScriptService`, `InstanceService`, `AreaService`, `SiteService`,
`TemplateFolderService`, `DeploymentService`, `ArtifactDeploymentService` — injects
`IAuditService` and calls `LogAsync` on every mutation (`grep` confirms an `_auditService.LogAsync`
call after each `Create`/`Update`/`Delete` in `TemplateService.cs`, `DeploymentService.cs`,
`ArtifactDeploymentService.cs`, etc.). There is therefore no audit gap; if anything, adding
explicit `AuditAsync` to a delegating handler would *double-log*. The genuine issue is purely
organizational: the two-layer split (actor audits repo-direct mutations, services audit their
own) was undocumented, which is what made it look risky. This is a Low-severity
code-organization issue, not a Medium error-handling/resilience defect.

**Recommendation**

Document the chosen contract so the split cannot be misread as a gap. (The original
alternative — moving all auditing into the actor — would require un-auditing eight services
and is not warranted given they already audit correctly.)

**Resolution**

Resolved 2026-05-16 (commit pending). Re-triaged to Low / Code organization after verifying
all eight delegated-to services audit internally — no audit gap exists. Documented the
two-layer audit contract in an XML `<remarks>` block on `ManagementActor.AuditAsync`:
repository-direct mutations call `AuditAsync`; service-delegating handlers must not, because
the services own auditing and a duplicate call would double-log. No behavioural change, so
no new regression test; existing `CreateInstanceCommand_WithDeploymentRole_ReturnsSuccess`
covers the explicit-audit path.

### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used

| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementServiceOptions.cs:5`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementEndpoints.cs:16` |

**Description**

`ManagementServiceOptions.CommandTimeout` is bound from configuration in
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
in `appsettings.json` gets no effect.

**Recommendation**

Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
configuration cannot be set with no effect.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed: `ManagementEndpoints` hard-coded
`AskTimeout = TimeSpan.FromSeconds(30)` and never read `ManagementServiceOptions.CommandTimeout`.
`HandleRequest` now resolves `IOptions<ManagementServiceOptions>` from `context.RequestServices`
and computes the Ask timeout via a new `ManagementEndpoints.ResolveAskTimeout` helper, which
returns the configured `CommandTimeout` when strictly positive and otherwise falls back to the
30s default (guarding against a misconfigured zero/negative value that would fail every call).
Regression tests: `ResolveAskTimeout_UsesConfiguredCommandTimeout`,
`ResolveAskTimeout_WithNullOptions_FallsBackToDefault`,
`ResolveAskTimeout_WithNonPositiveTimeout_FallsBackToDefault`.

### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code

| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:273`, `:283` |

**Description**

The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
enumerate role mappings for arbitrary LDAP groups with no role requirement
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
the design says no longer exists.

**Recommendation**

If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
data unauthenticated is intended.

**Resolution**

Resolved 2026-05-16 (commit pending). Confirmed dead path: a repository-wide search found no
`ResolveRolesCommand` sender outside `ManagementActor` itself — the CLI and HTTP endpoint
perform LDAP auth + role resolution inline. Removed the `ResolveRolesCommand` dispatch case
and the `HandleResolveRoles` handler from `ManagementActor`; a stray ClusterClient sender now
falls through to the `NotSupportedException` default and gets a uniform `ManagementError`
(closing the unauthenticated role-mapping enumeration surface, since `GetRequiredRole`
returned null for it). A code comment at the former dispatch site documents the intentional
omission. Note: the `ResolveRolesCommand` *record* itself lives in
`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Management/SecurityCommands.cs` and was left in place — that
file is outside this module's permitted edit scope; deleting the orphan record should be done
as a Commons-module follow-up. With the handler removed it is now an inert,
registry-only type with no behaviour. Regression test:
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.

### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload

| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:132` |

**Description**

`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
`switch` with a `NotSupportedException` default for unknown types. While the individual
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
message contract is also harder to evolve safely under the additive-only rule.

**Recommendation**

Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
and type the envelope payload as that interface. This documents the contract, lets analyzers
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.

**Resolution**

Deferred 2026-05-16. Finding verified as genuine: `ManagementEnvelope.Command` is typed
`object` and the recommended `IManagementCommand` marker-interface fix is sound. However, the
fix cannot be implemented within the `ManagementService` module: both `ManagementEnvelope` and
all ~50 `*Command` records live in `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Management/` (17 files),
which is outside this work item's permitted edit scope (`src/ZB.MOM.WW.ScadaBridge.ManagementService/**`,
its tests, and this findings file only). Adding the marker interface, retyping the envelope,
and having `ManagementCommandRegistry` constrain its reflection scan to `IManagementCommand`
implementers is a cohesive Commons-module change and must be done there — also so the Commons
message-contract additive-only evolution rules are respected. Deferred to a Commons-module
work item; no `ManagementService`-local change is appropriate.

### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub

| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.ManagementService.Tests/ManagementActorTests.cs:1` |

**Description**

`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
sites, so no test ever exercises a site-scoped user.

**Recommendation**

Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.

**Resolution**

Resolved 2026-05-16 (commit pending). The site-scope and `DebugStreamHub` coverage gaps
were closed by the resolution of findings 001/002/003 (the `ScopedEnvelope` helper plus the
`*_OutOfScopeForSiteScopedUser_ReturnsUnauthorized` tests and `DebugStreamHubTests`). The
remaining HTTP-endpoint gap is now covered by a new `ManagementEndpointsTests.cs` exercising
`ManagementEndpoints.ParseCommand` — command deserialization, malformed JSON, missing
`command` field, and unknown commands. Full `WebApplicationFactory` auth-flow tests were
deliberately not added: `ManagementEndpoints` depends on `LdapAuthService` and live LDAP
infrastructure, so the testable command-parsing/dispatch logic was extracted into the pure
`ParseCommand` helper and covered instead. Tests: `ParseCommand_*` (5),
`SerializeResult_*` (2), `UnknownCommandType_FaultMappedToManagementError`, plus the
pre-existing site-scope and DebugStreamHub suites. `dotnet test` -> 48 passed.

### ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement

| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:306`, `:1174` |

**Description**

`QueryDeploymentsCommand` is gated to the `Deployment` role by `GetRequiredRole`
(`ManagementActor.cs:170`–`:177`), and the design document's Authorization section states
"Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments
among the Deployment-role operations. `HandleQueryDeployments` makes no call to
`EnforceSiteScope` / `EnforceSiteScopeForInstance` / `EnforceSiteScopeForIdentifier`: with no
`InstanceId` it returns `IDeploymentManagerRepository.GetAllDeploymentRecordsAsync()` (every
deployment record across all sites), and with an `InstanceId` it returns that instance's
deployment history with no check that the instance's site is within the caller's permitted
set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment
records for instances at site B — instance IDs, `DeployedBy` (operator usernames),
timestamps, deployment status, and `ErrorMessage` content — by issuing `QueryDeployments`
with or without an out-of-scope `InstanceId`. This is the same authorization-bypass class as
the resolved findings 001/002, on a handler that sweep did not cover; it is `DispatchCommand`'s
only `Deployment`-role handler with no scope enforcement.

**Recommendation**

Thread `AuthenticatedUser` into `HandleQueryDeployments` (the dispatch case at line 306
already has `user` in scope). When `cmd.InstanceId` is supplied, call
`EnforceSiteScopeForInstance` before querying. When it is not supplied, filter the returned
`DeploymentRecord` list to the caller's permitted sites — resolve each record's instance to
its `SiteId` (or join through a site-aware repository query) and drop records for sites
outside `PermittedSiteIds`, mirroring the `HandleListInstances` / `HandleListSites` filter
pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope
instances.

**Resolution**

Resolved 2026-05-17 (commit pending). **Re-triage:** the finding understated the gap — it
claimed `QueryDeploymentsCommand` was already "gated to the `Deployment` role by
`GetRequiredRole` (lines 170–177)". Verified against the source: `QueryDeploymentsCommand`
appeared nowhere in `GetRequiredRole`, so it required *no* role at all — any authenticated
user could read every deployment record system-wide. Fix applied both gates: added
`QueryDeploymentsCommand` to the `Deployment`-role group in `GetRequiredRole`, and threaded
`AuthenticatedUser` into `HandleQueryDeployments` — the `InstanceId` branch now calls
`EnforceSiteScopeForInstance`; the unfiltered branch resolves each record's instance to its
`SiteId` (cached) and drops records outside `PermittedSiteIds`, mirroring `HandleListInstances`.
Regression tests: `QueryDeployments_WithDesignRole_ReturnsUnauthorized`,
`QueryDeployments_FilteredByOutOfScopeInstance_ReturnsUnauthorized`,
`QueryDeployments_FilteredByInScopeInstance_ReturnsRecords`,
`QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords`,
`QueryDeployments_UnfilteredForAdminUser_ReturnsAllRecords`.

### ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically

| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:647`–`:659` |

**Description**

`HandleSetInstanceOverrides` iterates `cmd.Overrides` and calls
`InstanceService.SetAttributeOverrideAsync` once per attribute, throwing
`InvalidOperationException` on the first `result.IsSuccess == false`. Each
`SetAttributeOverrideAsync` call persists independently, so if the command supplies five
overrides and the third fails (e.g. an unknown attribute name or a validation error), the
first two overrides are already committed to the configuration database while the caller
receives a `ManagementError`. The instance is left partially mutated in a state the operator
neither sees nor requested, and the per-instance operation lock referenced in the design's
deployment decisions does not protect against this because the partial writes are committed
before the throw. A retry of the same command then re-applies the already-applied overrides.

**Recommendation**

Make the multi-override mutation all-or-nothing: either validate every requested override up
front before applying any, or apply all overrides within a single transaction / unit-of-work
so a mid-batch failure rolls back the earlier writes. If `InstanceService` cannot offer a
batch method, at minimum document the partial-application behaviour on `SetInstanceOverridesCommand`
and have the handler report which overrides were applied before the failure so the caller
can reconcile.

**Resolution**

Resolved 2026-05-17 (commit pending). Confirmed: each `SetAttributeOverrideAsync` call
commits independently, so a mid-batch failure left earlier overrides persisted.
`HandleSetInstanceOverrides` now validates every requested attribute up front against the
instance's template (exists, not locked) and only begins applying once the whole batch is
known valid — eliminating the realistic partial-mutation failure modes (unknown / locked
attribute). `InstanceService` is outside this module's edit scope and offers no batch/
transactional method, so a genuine database fault mid-apply remains theoretically possible;
this residual is documented in a code comment on the handler. Regression tests:
`SetInstanceOverrides_WithOneInvalidAttribute_PersistsNoOverrides`,
`SetInstanceOverrides_AllValid_PersistsAllOverrides`.

### ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers

| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:121`–`:131` |

**Description**

`MapFault` maps any non-`SiteScopeViolationException` fault to
`new ManagementError(correlationId, cause.Message, "COMMAND_FAILED")`, and
`ManagementEndpoints.HandleRequest` returns that `Error` string directly in the HTTP 400
body. For handler-thrown `InvalidOperationException`s carrying a curated `result.Error`
message this is intended and safe. But the same path also surfaces the raw `.Message` of
unanticipated exceptions — a `SqlException` (which can include server/database names and
constraint details), a `DbUpdateException`, an `ArgumentException` from `Enum.Parse` on a
malformed `DataType`/`TriggerType` value, or a `NullReferenceException` — straight to the
external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception
text is already logged server-side with full context, so the client does not need the raw
message.

**Recommendation**

Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated
exception type (e.g. `ManagementCommandException`) for messages that are safe to surface, and
in `MapFault` return that message for the known type while returning a generic
"An internal error occurred (CorrelationId=...)" string for everything else — the operator
can still correlate to the server log via the correlation ID.

**Resolution**

Resolved 2026-05-17 (commit pending). Added a `ManagementCommandException` type for curated,
caller-safe failures and converted every curated `throw` in `ManagementActor` (the 34
`result.Error` rethrows and 15 "not found" / delete-blocked messages) to it. `MapFault` now
returns the message verbatim only for `ManagementCommandException`; any other fault (DB
errors, `Enum.Parse` `ArgumentException`, `NullReferenceException`, the unknown-command
`NotSupportedException`, etc.) yields a generic `"An internal error occurred
(CorrelationId=...)"` string while the full exception is still logged server-side. Regression
tests: `UnexpectedFault_ReturnsGenericMessage_NotRawExceptionText`,
`CuratedHandlerFailure_SurfacesTheCuratedMessage`; the two pre-existing
`*_WhenRepoThrows_*` tests were updated to assert the raw repository message is no longer
leaked.

### ManagementService-017 — QueryDeploymentsCommand has no test coverage

| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.ManagementService.Tests/ManagementActorTests.cs:1` |

**Description**

`QueryDeploymentsCommand` / `HandleQueryDeployments` is exercised by no test in
`ManagementActorTests`. There is no test that it requires the `Deployment` role, no test of
the `InstanceId`-filtered versus unfiltered branches, and — because the handler performs no
site-scope enforcement at all — no test that would have caught finding 014. The deployment
query is one of the operations the design's Authorization section calls out for site
scoping, yet it is the only `Deployment`-role command with neither an authorization test nor
a site-scope test.

**Recommendation**

Add tests for `QueryDeploymentsCommand`: a role test (Design/no-role caller →
`ManagementUnauthorized`), branch coverage for the `InstanceId`-filtered and unfiltered
repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped
Deployment user against in-scope and out-of-scope deployment records.

**Resolution**

Resolved 2026-05-17 (commit pending). Added seven `QueryDeployments_*` tests to
`ManagementActorTests`: role gate (`_WithDesignRole_ReturnsUnauthorized`), the
`InstanceId`-filtered and unfiltered branches (`_FilteredByInstanceId_ReturnsInstanceRecords`,
`_UnfilteredWithDeploymentRole_ReturnsAllRecords`), and site-scope coverage for a site-scoped
Deployment user and an Admin user, in- and out-of-scope
(`_FilteredByOutOfScopeInstance_ReturnsUnauthorized`, `_FilteredByInScopeInstance_ReturnsRecords`,
`_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords`, `_UnfilteredForAdminUser_ReturnsAllRecords`).

### ManagementService-018 — QueryAuditLogCommand has no role gate

| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:153`–`:207`, `:336`, `:1302` |

**Description**

`QueryAuditLogCommand` is dispatched at line 336 to `HandleQueryAuditLog`, which calls
`ICentralUiRepository.GetAuditLogEntriesAsync(...)` with no role check, no site-scope
check, and no actor filter. `GetRequiredRole` (lines 153–207) does not list
`QueryAuditLogCommand`, so it falls through to the `_ => null` case — i.e. "read-only
queries — any authenticated user". The parallel `/api/audit/query` endpoint in
`AuditEndpoints.HandleQuery` correctly enforces `AuthorizationPolicies.OperationalAuditRoles`
(`{ "Admin", "Audit", "AuditReadOnly" }`), so a CLI authenticated as a user with only the
`Deployment` role — or no roles at all — is rejected at `/api/audit/query` but can read
the *same* audit log table through `/management` by sending `QueryAuditLogCommand`. The
two surfaces enforce different permissions on the same data; the older
ManagementActor-routed path is the looser one. The audit log records every script-trust-
boundary action and is sensitive operationally — it should not be readable by a default
authenticated user.

This is the same authorization-bypass class as findings 001/002/014 and was missed in
that sweep because `QueryAuditLogCommand` (legacy `Action`/`EntityType` filter) is a
separate command from the new keyset-paged `IAuditLogRepository.QueryAsync` path the
`/api/audit/query` endpoint uses.

**Recommendation**

Add `QueryAuditLogCommand` to `GetRequiredRole`. The natural fit is a new
`"OperationalAudit"`-style role group — but `GetRequiredRole` returns a single string and
the project's existing role gates do too (`Admin`/`Design`/`Deployment`). Two equally
defensible options:

1. Add `QueryAuditLogCommand` to the `Admin`-required group — strict, mirrors that
   `AuditExportRoles` includes `Admin`. The CLI's CLI-017/018 audit work uses
   `/api/audit/query`, so `QueryAuditLogCommand` may be effectively orphaned anyway.
2. Extend `GetRequiredRole` to return a role *set* and add an `AuditRoles` group equal to
   `AuthorizationPolicies.OperationalAuditRoles`, so the two surfaces converge.

Recommended: option 1 plus a deprecation comment on `QueryAuditLogCommand` pointing at
`/api/audit/query` — the legacy command's filter shape is a subset of the new endpoint's,
so the ManagementActor route is redundant. Add a regression test asserting that a
no-role / `Deployment`-only caller gets `ManagementUnauthorized` for `QueryAuditLogCommand`.

**Resolution**

Resolved 2026-05-28 (commit pending) per recommendation option 1. `QueryAuditLogCommand`
was added to the Admin-required group in `GetRequiredRole`, with an inline comment
documenting the deliberate strictness vs. the keyset-paged `/api/audit/query`
(`OperationalAuditRoles`) and pointing new audit consumers at the REST endpoint.
The CentralUI `ConfigurationAuditLog` page reads via `ICentralUiRepository` directly
(not through this command), so the gate tightening does not break any UI flow. Two
regression tests pin the new behaviour:
`QueryAuditLogCommand_WithNoRoles_ReturnsUnauthorized` and
`QueryAuditLogCommand_WithDeploymentRole_ReturnsUnauthorized` — both fail on the
pre-fix code (the command fell through to "any authenticated user") and pass after.

### ManagementService-019 — AuditEndpoints builds PermittedSiteIds but never enforces them

| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs:358`–`:368`, `:397`–`:437` |

**Description**

`AuditEndpoints.AuthenticateAsync` resolves the caller's roles AND `PermittedSiteIds` and
wraps them in an `AuthenticatedUser` (lines 358–366), but the returned `AuthenticatedUser`
is then only used for the `HasAnyRole(...)` role check on lines 114 and 163 — its
`PermittedSiteIds` are never read. `ParseFilter` (line 397) accepts the caller-supplied
`sourceSiteId=...` query string verbatim and passes it straight into the
`IAuditLogRepository.QueryAsync` filter. A user whose `Audit` (or `AuditReadOnly`) role
mapping carries scope rules — e.g. `AuditReadOnly` scoped to "plant-a" — can still ask
for `sourceSiteId=plant-b` and get back rows for plant-b.

Today this gap is partially benign because the design treats `Audit`/`AuditReadOnly` as
non-site-scoped roles (`Component-AuditLog.md` does not list site scoping for the audit
permissions, and the LDAP role mapping UI does not currently surface site scope rules
for those roles). But (a) the `RoleMapper` will silently honour scope rules attached to
any role, including `Audit`, so an operator who *does* configure them gets a UI that
says "scoped" and an endpoint that ignores the scope — a contract violation; (b) the
`Admin` role's `PermittedSiteIds` are always empty (system-wide), so enforcing for the
other roles is cheap. The asymmetry with the `/management` endpoint — which routes every
site-targeted command through `EnforceSiteScope` — is also a maintenance hazard.

**Recommendation**

Decide explicitly whether the audit endpoints honour site scope. Two options:

1. **Honour scope** — in `HandleQuery` / `HandleExport`, after the role check, intersect
   the caller-supplied `filter.SourceSiteIds` with `user.PermittedSiteIds`. If the
   caller supplied no `sourceSiteId` and `PermittedSiteIds` is non-empty, restrict to
   `PermittedSiteIds`. If the intersection is empty, return an empty page (or a 403 if
   the caller explicitly asked for an out-of-scope site).
2. **Document the intentional bypass** — drop the `PermittedSiteIds` field from the
   `AuthenticatedUser` constructed in `AuthenticateAsync` (or comment it as "ignored —
   audit roles are not site-scoped") so the code stops carrying a value it does not
   read, and add an XML doc note on the endpoint class that audit roles are always
   system-wide by design.

Recommended: option 1, mirroring the `ManagementActor` pattern — same security posture
across both surfaces. Add a regression test that a site-scoped `AuditReadOnly` user
filtering on an out-of-scope site gets a 403 (or an empty page).

**Resolution**

Resolved 2026-05-28 (commit pending) per recommendation option 1. Added a public
helper `AuditEndpoints.ApplySiteScope(AuditLogQueryFilter, AuthenticatedUser)` that
returns the restricted filter (or `null` when the caller explicitly asks for an
out-of-scope site). Three cases:
- Empty `PermittedSiteIds` (Admin or any unscoped role) → filter returned unchanged.
- Scoped user with empty caller filter → `SourceSiteIds` set to the permitted set.
- Scoped user with explicit `SourceSiteIds` → intersected with the permitted set;
  empty intersection returns `null` so `HandleQuery` / `HandleExport` emit a 403
  rather than silently producing an empty page.

Both `HandleQuery` and `HandleExport` now call the helper after the role check and
short-circuit to `Forbidden("OperationalAudit"|"AuditExport")` on a `null` result.
Audit roles remain non-site-scoped by design (the design doc unchanged), but the
helper honours scope rules if an operator attaches them via the LDAP-mapping UI,
matching the existing `ManagementActor` pattern. Regression tests added in
`AuditEndpointsTests.ApplySiteScope_*` (5 tests): system-wide unchanged,
empty-caller-filter restricted, in-scope kept verbatim, out-of-scope returns null,
mixed-set intersected.

### ManagementService-020 — UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim

| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:1136`–`:1153` |

**Resolution (2026-05-28):** Added `SmtpConfigPublicShape` projection that
returns every non-secret field plus a `HasCredentials` bool — never the
`Credentials` field itself. Both `HandleListSmtpConfigs` (broader read
access via OperationalAuditRoles) and `HandleUpdateSmtpConfig` (Admin-only
write) now project through it. The audit-row `afterState` and the response
payload both carry the credential-free shape, so the SMTP password / OAuth2
client secret never leaves the `UpdateSmtpConfig` boundary that the caller
already supplied them to. ManagementService 100/100 tests still pass.
Follow-up to tag `SmtpConfiguration.Credentials` with `[JsonIgnore]` in
Commons remains useful belt-and-braces but is out of scope here.

**Description**

`HandleUpdateSmtpConfig` reads the existing `SmtpConfiguration` entity, applies the
incoming command, and then **(a)** passes the full `config` object as the `afterState`
to `AuditAsync` (line 1151) — meaning the SMTP credential string is persisted in the
audit log — and **(b)** returns the full `config` to the caller (line 1152), which is
serialized via `SerializeResult` and sent back over HTTP. `SmtpConfiguration.Credentials`
carries the SMTP-Auth password (for `Basic`) or the OAuth2 client secret (for
`OAuth2ClientCredentials`); `SmtpConfiguration` has no `[JsonIgnore]` on this field
and `SerializeResult`'s `JsonSerializerOptions` does not exclude it. The pattern
parallels what ConfigurationDatabase-012 fixed for inbound API keys: a credential
artifact must not be echoed back through every read/audit path.

The credential is supplied by the operator in `UpdateSmtpConfigCommand.Credentials`,
so the caller already has it. But (1) anyone with read access to the audit log
(`OperationalAuditRoles`) can now retrieve every SMTP credential change verbatim — a
strictly larger blast radius than `Admin`-only `UpdateSmtpConfig`. (2) The serialized
`config` echo means the credential moves over the wire in the response even though the
caller has no need for it. (3) Any future read path that returns
`SmtpConfiguration` — `ListSmtpConfigsCommand` already does at line 1130 — will leak
the stored credential too.

**Recommendation**

Three changes, in order of priority:

1. In `HandleUpdateSmtpConfig` and `HandleListSmtpConfigs`, project to a credential-free
   shape before returning — e.g. `new { config.Id, config.Host, config.Port,
   config.AuthType, config.FromAddress, config.TlsMode }`. Match the
   `HandleListApiKeys` pattern.
2. In `AuditAsync` for the SMTP path, pass a credential-free `afterState` (the same
   anonymous shape). The fact that *something* changed is auditable; the secret value
   is not.
3. Tag `SmtpConfiguration.Credentials` with `[JsonIgnore]` in Commons (out-of-scope edit
   for this module, but worth a follow-up). Alternatively, configure
   `ResultSerializerOptions` with a property name policy that skips a known set of
   credential field names — but a per-entity projection is cleaner.

Add regression tests: `UpdateSmtpConfig_DoesNotEchoCredentialsInResponse` and
`UpdateSmtpConfig_DoesNotPersistCredentialsInAuditLog`.

### ManagementService-021 — Transport bundle handlers have zero test coverage

| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.ManagementService.Tests/ManagementActorTests.cs:1`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:1717`–`:1897` |

**Resolution (2026-05-28):** Added six `ManagementActorTests` cases covering the load-bearing bundle behaviours. Role gating: `ExportBundleCommand_WithAdminRole_ReturnsUnauthorized` (Export needs Design), `PreviewBundleCommand_WithDesignRole_ReturnsUnauthorized`, and `ImportBundleCommand_WithDesignRole_ReturnsUnauthorized` (Preview/Import need Admin). Name resolution: `ExportBundleCommand_WithUnknownTemplateName_ReturnsManagementError` proves the `ResolveIds` `ManagementCommandException` surfaces verbatim with the missing entity type + name. Handler logic: `ImportBundleCommand_WithBlockerRow_AbortsBeforeApply` seeds a `ConflictKind.Blocker` preview and asserts `IBundleImporter.ApplyAsync` is never called and the error names the blocker; `ImportBundleCommand_DuplicatePreviewItems_DedupePerEntityTypeAndName` seeds three rows for the same `(Template, "Dup")` key (Identical, Modified, Identical) and asserts only one resolution reaches `ApplyAsync` with last-write-wins (`Skip`, overriding the prior `Modified`/`Overwrite`). All bundle tests use substituted `IBundleExporter`/`IBundleImporter` via a new `AddBundleSubstitutes()` helper plus stub `GetAll*Async` repository returns. 106/106 ManagementService.Tests pass.

**Description**

The three Transport (#24) bundle handlers — `HandleExportBundle`, `HandlePreviewBundle`,
`HandleImportBundle` (~180 lines of handler logic at the bottom of `ManagementActor.cs`)
— have **no tests** in `ManagementActorTests`. Specifically untested:

1. **Role gating.** `ExportBundleCommand` requires `Design`; `PreviewBundleCommand` and
   `ImportBundleCommand` require `Admin`. No test asserts that the wrong role gets
   `ManagementUnauthorized`. CLI-017 / CLI-018 just landed around bundle plumbing — a
   future refactor that moves these commands between role groups in `GetRequiredRole`
   would silently regress the gate.
2. **Name resolution in `HandleExportBundle`.** The inner `ResolveIds<T>` helper raises
   `ManagementCommandException` for unknown names. The "all entity types" branch
   (`cmd.All == true`) and the "missing name" branch are both untested.
3. **`HandleImportBundle` blocker rejection.** The handler aborts before `ApplyAsync`
   when any `ConflictKind.Blocker` row is present; the produced error message is
   curated and surfaced to the caller, but no test asserts the abort path or that the
   importer's `ApplyAsync` was not called.
4. **Resolution dedupe.** `HandleImportBundle` dedupes `(EntityType, Name)` keys
   last-write-wins — the dedupe is critical (CLI-014 was about it on the CLI side) but
   has no actor-side regression test.
5. **`DecodeBundle` failure modes** (empty/non-base64 input) — both branches return
   curated `ManagementCommandException` but neither is exercised.
6. **`ParseConflictPolicy`** for `"skip"`, `"overwrite"`, `"rename"`, and the invalid-
   value branch — all untested.

Given the size and reach of the bundle path (cross-cutting central configuration
import), this gap is materially larger than usual for new handler code.

**Recommendation**

Add an `ImportBundleHandlerTests` suite covering:
- role gating for all three commands (`Design`/`Admin` mismatch -> `ManagementUnauthorized`),
- `ExportBundleCommand(All: true)` happy-path,
- `ExportBundleCommand` with an unknown name -> `ManagementError`,
- `ImportBundleCommand` with a `Blocker` row -> `ManagementError` and `ApplyAsync` not called,
- `ImportBundleCommand` with duplicate preview items -> dedupe to one resolution per (type, name),
- `DecodeBundle` empty/invalid base64,
- `ParseConflictPolicy` all four branches.

Use NSubstitute for `IBundleImporter` / `IBundleExporter` (no need for a real bundle in
the actor tests; the bundle round-trip belongs in `Transport` tests).

### ManagementService-022 — Design doc is stale on Transport bundle commands, /api/audit/* endpoints, and CommandTimeout

| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `docs/requirements/Component-ManagementService.md:77`–`:175`, `:205`–`:209` |

**Resolution (2026-05-28):** Updated `Component-ManagementService.md` — added a "Transport (Bundle Import / Export)" entry under "Message Groups" listing `ExportBundle` (Design), `PreviewBundle`/`ImportBundle` (Admin); inserted a new "HTTP Audit API" section describing `GET /api/audit/query` (`OperationalAudit`) and `GET /api/audit/export` (`AuditExport`) with the site-scope intersection rule; and rewrote the Configuration table to document the now-wired `CommandTimeout` (30 s default, non-positive falls back). Legacy `QueryAuditLog` is now annotated as Admin-gated and superseded by the REST surface for #23.

**Description**

`Component-ManagementService.md` does not mention three pieces of shipped functionality:

1. **Transport (#24) bundle commands.** `ExportBundleCommand`, `PreviewBundleCommand`,
   and `ImportBundleCommand` are dispatched at `ManagementActor.cs:350`–`:352` and
   role-gated in `GetRequiredRole` (Design for Export; Admin for Preview/Import). The
   design doc's "Message Groups" section enumerates Templates, Instances, Sites, Data
   Connections, Deployments, External Systems, Notifications, Security, Audit Log,
   Shared Scripts, Database Connections, Inbound API Methods, Health, and Remote
   Queries — but has no "Transport" / "Bundles" group. The CLI now offers `bundle
   export`/`preview`/`import` (per the recent CLI-017/018 work) and points
   at these commands.
2. **`/api/audit/*` endpoints.** The doc's "HTTP Management API" section (line 52)
   describes only `POST /management`. `AuditEndpoints.MapAuditAPI()` adds
   `GET /api/audit/query` and `GET /api/audit/export` with their own auth-and-role
   path mirroring `ManagementEndpoints` (intentionally — see the `AuditEndpoints` XML
   docs), but the design doc gives no signal that the module exposes more than one
   route group, no per-endpoint role mapping table, and no mention that the response
   shape differs (keyset cursor vs. opaque page).
3. **`CommandTimeout`.** Line 209 still says "Reserved for future configuration —
   e.g., command timeout overrides", but ManagementService-010 wired the option through
   `ResolveAskTimeout`. The doc is stale.

**Recommendation**

Update `Component-ManagementService.md`:

- Add a "Transport" entry to "Message Groups" listing `ExportBundle`,
  `PreviewBundle`, `ImportBundle` with their per-command roles. Cross-reference
  `Component-Transport.md`.
- Add an "Audit Log HTTP API" subsection under "HTTP Management API" describing
  `GET /api/audit/query` (keyset cursor, `OperationalAuditRoles`) and
  `GET /api/audit/export` (csv/jsonl streaming, `AuditExportRoles`, parquet 501).
  Note the deliberate divergence in the source-site query-string key
  (`sourceSiteId` vs CentralUI's `site`).
- In the "Configuration" table, replace "Reserved for future configuration" with the
  actual `CommandTimeout` semantics: "Max time the HTTP endpoint will Ask the
  ManagementActor before returning HTTP 504; falls back to 30 s when unset or
  non-positive."

### ManagementService-023 — HandleQueryDeployments unfiltered branch is N+1 on instance lookup

| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:1276`–`:1295` |

**Resolution (2026-05-28):** Swapped the per-`InstanceId` `GetInstanceByIdAsync` lookup loop for a single `templateRepo.GetAllInstancesAsync()` bulk fetch, projected into a `Dictionary<int, int?>` keyed by `Instance.Id`. The unfiltered branch now hits the configuration database exactly twice (deployment records + instances) regardless of fleet size. Existing test `QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords` was updated to mock the bulk path and to assert `GetInstanceByIdAsync` is no longer used on the unfiltered branch; new regression test `QueryDeployments_UnfilteredForSiteScopedUser_UsesBulkInstanceLoad_NotPerRecordLookup` pins `GetAllInstancesAsync` is invoked exactly once even with duplicate-instance records.

**Description**

The site-scoped unfiltered branch of `HandleQueryDeployments` (added under
ManagementService-014) reads every `DeploymentRecord` via `GetAllDeploymentRecordsAsync`,
then for each *unique* `record.InstanceId` calls
`ITemplateEngineRepository.GetInstanceByIdAsync` to resolve the instance's
`SiteId`. The handler caches results in `instanceSiteCache` so each instance is loaded
at most once per call, but for a fleet with N distinct instances having deployment
history, the handler still issues N round-trips to the configuration database to
authorize a single query. With a large deployment history the cumulative DB hit can be
material; it also runs every time a site-scoped user opens the deployments page.

This is acceptable in steady state today (sites tend to have small fleets and few
deployments) but is a textbook N+1 read pattern, and on a busy day for a site-scoped
operator the cost will dominate the request. Admin and system-wide Deployment users
correctly skip the loop (they hit only `GetAllDeploymentRecordsAsync`).

**Recommendation**

Add a batch-resolve method to `ITemplateEngineRepository` — e.g.
`Task<IDictionary<int, int>> GetInstanceSiteIdsAsync(IEnumerable<int> instanceIds)` —
backed by a single EF query
(`Instances.Where(i => instanceIds.Contains(i.Id)).Select(i => new { i.Id, i.SiteId })`).
`HandleQueryDeployments` would then issue exactly two queries on the unfiltered branch
(records + sites) regardless of fleet size. The change is additive to
`ITemplateEngineRepository` and out-of-module for the actual implementation, but the
handler change is local; a quick interim alternative is to project deployment records
to include the instance's `SiteId` at the repo level, which removes the second query
entirely.

Defer until a noticeable hot path emerges, but track it: this is the only N+1 in
`ManagementActor` once 002 / 014 are folded in.