# Code Review — SiteRuntime | Field | Value | |-------|-------| | Module | `src/ScadaLink.SiteRuntime` | | Design doc | `docs/requirements/Component-SiteRuntime.md` | | Status | Reviewed | | Last reviewed | 2026-05-16 | | Reviewer | claude-agent | | Commit reviewed | `9c60592` | | Open findings | 0 | ## Summary The SiteRuntime module is broadly well-structured: the actor hierarchy matches the design doc, supervision strategies are explicit, and the trigger/alarm evaluation logic is thorough. However the review surfaced one genuinely serious correctness defect — `Instance.SetAttribute` never routes writes to the Data Connection Layer for data-sourced attributes, contradicting a core design decision and silently turning device writes into local-only static overrides. Several other findings cluster around two themes: (1) actor-thread discipline is violated in a few hot paths (blocking `.GetAwaiter().GetResult()` calls on the actor thread, a fragile fixed-delay reschedule for redeployment), and (2) the site-local repositories reach into `SiteStorageService` private state via reflection and mint entity IDs with the non-deterministic `string.GetHashCode()`. Script execution runs on the default thread pool rather than a dedicated blocking dispatcher (the code acknowledges this in a comment but ships it anyway). Test coverage exists for the coordinator actors, persistence and scripting, but the short-lived execution actors, the replication actor, and the repositories are untested. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ✓ | SetAttribute mis-routing, deploy double-count, redeploy reschedule race. | | 2 | Akka.NET conventions | ✓ | Blocking on actor thread, script execution not on a dedicated dispatcher, premature success reply. | | 3 | Concurrency & thread safety | ✓ | `_attributes` dictionary shared with child actors by reference; `_executionCounter` is actor-confined (OK). | | 4 | Error handling & resilience | ✓ | Deploy reports Success before persistence; replicated artifact/S&F failures only logged (matches best-effort design). | | 5 | Security | ✓ | Trust-model validation is substring-based and weak; reflection used to read private fields. | | 6 | Performance & resource management | ✓ | Per-call SQLite connections (acceptable); CPU-bound scripts not interruptible by timeout. | | 7 | Design-document adherence | ✓ | SetAttribute DCL routing missing; staggered-startup and supervision otherwise conform. | | 8 | Code organization & conventions | ✓ | Repositories reflect into another class; synthetic IDs non-deterministic. | | 9 | Testing coverage | ✓ | No tests for ScriptExecutionActor, AlarmExecutionActor, SiteReplicationActor, or the two repositories. | | 10 | Documentation & comments | ✓ | Several XML comments describe behaviour the code does not implement (see findings). | ## Findings ### SiteRuntime-001 — `Instance.SetAttribute` never writes to the Data Connection Layer | | | |--|--| | Severity | High | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:204` | **Description** The design doc (Component-SiteRuntime.md, "GetAttribute / SetAttribute" and "Script Runtime API") states that `Instance.SetAttribute` on a *data-connected* attribute must send a write request to the DCL, which writes to the physical device, and that the in-memory value is **not** optimistically updated. For *static* attributes it updates memory and persists an override. The implementation makes no such distinction. `ScriptRuntimeContext.SetAttribute` unconditionally sends a `SetStaticAttributeCommand`, and `InstanceActor.HandleSetStaticAttribute` unconditionally treats every write as a static override: it mutates `_attributes`, publishes an `AttributeValueChanged` with hard-coded `"Good"` quality, notifies children, and persists a SQLite override. A script writing a data-sourced attribute therefore never reaches the device, the write failure can never be returned synchronously to the script, and the in-memory value diverges from the device until the next subscription update overwrites it. The persisted override is also wrong: data-sourced attributes should not have static overrides. **Recommendation** In `InstanceActor`, look up the target attribute in `_configuration.Attributes`. If it has a non-empty `DataSourceReference`, issue a DCL write (e.g. a `WriteTagRequest` to `_dclManager`) and surface success/failure to the caller; do not persist an override and do not optimistically mutate `_attributes`. Only attributes with no data source reference should follow the current static-override path. Consider splitting the message into `SetStaticAttributeCommand` vs `SetDataAttributeCommand`, or branching inside the handler. **Resolution** Resolved 2026-05-16 (``): `InstanceActor.HandleSetStaticAttribute` now resolves the target attribute's data binding from `_configuration`. Data-sourced attributes are routed via a new `HandleSetDataAttribute` that Asks the DCL with a `WriteTagRequest` and pipes the device-write outcome back to the caller as a `SetStaticAttributeResponse` — no override is persisted and `_attributes` is not optimistically mutated. Static attributes keep the override path and now also reply with a `SetStaticAttributeResponse`. `ScriptRuntimeContext.SetAttribute` is now `async Task` and Asks the Instance Actor, throwing `InvalidOperationException` on a failed device write so scripts get the failure synchronously. ### SiteRuntime-002 — `RouteInboundApiSetAttributes` always treats writes as static overrides | | | |--|--| | Severity | High | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:632` | **Description** `RouteInboundApiSetAttributes` (handling `Route.To().SetAttribute(s)` from the Inbound API) emits a `SetStaticAttributeCommand` for every attribute, so it inherits the same defect as SiteRuntime-001: writes to data-sourced attributes never reach the device and are instead persisted as static overrides. In addition the response is sent back as unconditionally successful (`true`) before the Instance Actor has even processed the command, so a non-existent attribute or a future DCL write failure is reported to the external caller as success. **Recommendation** Route through the same corrected `InstanceActor` write handler as SiteRuntime-001 so the static-vs-data distinction is honoured. The optimistic ack is acceptable for fire-and-forget static writes per the doc, but the XML comment should make the limitation explicit, and once data-attribute writes are supported they need a real response path. **Resolution** Resolved 2026-05-16 (``): `RouteInboundApiSetAttributes` now Asks the Instance Actor per attribute (instead of fire-and-forget Tell) and aggregates the `SetStaticAttributeResponse` results. Because the Instance Actor handler is the SiteRuntime-001 corrected handler, data-sourced attributes now reach the DCL and the `RouteToSetAttributesResponse` reflects the real per-attribute outcome — a non-existent attribute or a failed device write is reported as failure rather than an unconditional optimistic `true`. ### SiteRuntime-003 — Redeployment relies on a fixed 500 ms reschedule and can collide on the child actor name | | | |--|--| | Severity | High | | Category | Akka.NET conventions | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:222` | **Description** `HandleDeploy` stops an existing Instance Actor with `Context.Stop` and then reschedules the same `DeployInstanceCommand` to itself after a hard-coded 500 ms, hoping the child has fully terminated by then. `Context.Stop` is asynchronous; the child is only removed from the parent's children collection after it actually stops (including running `PostStop` on its descendants). If a deeply nested or slow hierarchy takes longer than 500 ms, `CreateInstanceActor` calls `Context.ActorOf` with a name that still belongs to the terminating child and throws `InvalidActorNameException`. The `_instanceActors` dictionary check does not prevent this — the dictionary entry is removed immediately, but the Akka child registry is not. The 500 ms delay is also unconditionally added to every redeploy latency. **Recommendation** Watch the terminating child (`Context.Watch`) and recreate the Instance Actor only after receiving the `Terminated` message, instead of guessing with a timer. Buffer or stash the in-flight `DeployInstanceCommand` (and any further commands for that instance) until termination completes. **Resolution** Resolved 2026-05-16 (``): `HandleDeploy` no longer uses a fixed 500 ms reschedule. When a redeployment targets a running instance, the existing Instance Actor is `Context.Watch`-ed and stopped, and the in-flight `DeployInstanceCommand` is buffered in a `_pendingRedeploys` map keyed by the terminating actor ref. A new `Terminated` handler recreates the Instance Actor only after the predecessor (and its whole subtree) has fully stopped, eliminating the `InvalidActorNameException` race and the unconditional redeploy-latency penalty. The shared `ApplyDeployment` helper also skips the `_totalDeployedCount` increment for redeployments, so the deployed-instance count no longer drifts (this additionally addresses the root cause behind SiteRuntime-004). ### SiteRuntime-004 — `_totalDeployedCount` is incremented on redeployment of an existing instance | | | |--|--| | Severity | Medium — re-triaged: already fixed by the SiteRuntime-003 resolution. | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs` (`ApplyDeployment`) | **Description** In `HandleDeploy`, the existing-actor branch (line 223) reschedules the command and returns. When the rescheduled command runs, no actor exists, so the code falls through to the "new instance" branch and executes `_totalDeployedCount++` (line 239). A redeployment is an *update* of an already-deployed instance, not a new one, so the deployed count is over-counted by one on every redeploy. `StoreDeployedConfigAsync` uses UPSERT semantics, so the SQLite row count does not grow, but the in-memory `_totalDeployedCount` (reported to the health collector via `UpdateInstanceCounts`) drifts upward and the reported "disabled" count becomes wrong. **Re-triage (2026-05-16)** Verified against the current source: this is **already fixed**. The SiteRuntime-003 resolution replaced the fixed-delay reschedule with a shared `ApplyDeployment` helper that takes an `isRedeploy` flag and guards the counter with `if (!isRedeploy) _totalDeployedCount++;`. The redeploy path (`HandleTerminated`) always calls `ApplyDeployment(..., isRedeploy: true)`, so the counter is no longer bumped on redeployment. The regression test `DeploymentManagerRedeployTests.Redeploy_ExistingInstance_DoesNotOverCountDeployedInstances` already covers this and passes. No further code change was required. **Resolution** Resolved 2026-05-16 (`commit pending`): no new change needed — the root cause was eliminated by the SiteRuntime-003 fix (the `isRedeploy` guard in `ApplyDeployment`). Confirmed by the existing passing regression test `Redeploy_ExistingInstance_DoesNotOverCountDeployedInstances`. Re-triaged from Open to Resolved. ### SiteRuntime-005 — Deployment reports `Success` to central before persistence completes | | | |--|--| | Severity | Medium | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs` (`ApplyDeployment`, `HandleDeployPersistenceResult`) | **Description** `HandleDeploy` replies to central with `DeploymentStatus.Success` immediately after creating the Instance Actor, while the SQLite persistence (`StoreDeployedConfigAsync` + `ClearStaticOverridesAsync`) runs asynchronously on a `Task.Run`. If persistence fails, `HandleDeployPersistenceResult` only logs an error — central has already been told the deployment succeeded. On a subsequent node restart or failover the instance will not be re-created (it is not in SQLite), so the deployment is silently lost despite central recording success. This contradicts the design's intent that the site is the durable source of truth for deployed configs. **Recommendation** Persist the config before replying, or treat a persistence failure as a deployment failure and send a corrective `DeploymentStatusResponse`/health signal to central. At minimum, do not report `Success` until the config row is committed. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — `ApplyDeployment` sent `DeploymentStatusResponse(Success)` synchronously before the persistence `Task.Run` completed. The `Success` reply is now sent from `HandleDeployPersistenceResult` only once the persistence result is known: on success it replies `Success`; on a persistence failure it logs the error, stops the optimistically-created Instance Actor, rolls back the deployed-instance counter, and replies `DeploymentStatus.Failed` with the error message. `DeployPersistenceResult` carries an `IsRedeploy` flag so the counter rollback is skipped for redeployments. Regression tests: `DeploymentManagerMediumFindingsTests.Deploy_PersistenceFailure_ReportsFailedNotSuccess` and `Deploy_Success_ReportsSuccessAndPersistsConfig`. ### SiteRuntime-006 — Site-local repositories read `SiteStorageService` private field via reflection | | | |--|--| | Severity | Medium | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs` | **Description** Both repositories' `CreateConnection()` use `Type.GetField("_connectionString", BindingFlags.NonPublic | BindingFlags.Instance)` to extract the private connection string out of `SiteStorageService`. This is brittle (any rename or refactor of the field breaks it at runtime, not compile time), defeats encapsulation, and the accompanying XML comment openly describes it as a "pragmatic" hack and is internally contradictory (it states a connection string is "passed separately at DI registration time" which is not what the code does). It also sits awkwardly against the project's own script trust model, which forbids `System.Reflection` in scripts. **Recommendation** Expose the connection string properly: add an `ISiteStorageConnectionProvider` (already referenced in `ServiceCollectionExtensions` XML docs but not used), or have `SiteStorageService` expose a `CreateConnection()` factory, and inject that into the repositories. Remove the reflection entirely. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — both repositories reflected into `SiteStorageService._connectionString`. `SiteStorageService` now exposes a public `CreateConnection()` factory method that returns an unopened `SqliteConnection` against the site database. Both `SiteExternalSystemRepository` and `SiteNotificationRepository` now obtain connections via `_storage.CreateConnection()`; all reflection (`Type.GetField` / `BindingFlags`) and the contradictory XML comments have been removed. This is a fully in-module refactor — no cross-module design decision was needed. Regression test: `SiteRepositoryTests.ExternalSystemRepository_RoundTripsStoredDefinition` exercises the repository's connection path end-to-end. ### SiteRuntime-007 — Synthetic entity IDs use the non-deterministic `string.GetHashCode()` | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs`, `src/ScadaLink.SiteRuntime/Repositories/SyntheticId.cs` | **Description** `GenerateSyntheticId` computes `name.GetHashCode() & 0x7FFFFFFF`. On .NET Core, `string.GetHashCode()` is randomized per process by default, so the "stable deterministic synthetic ID" promised by the XML comment is not stable at all — it changes every time the process restarts. Any caller that obtained an ID and later calls `GetExternalSystemByIdAsync`/`GetNotificationListByIdAsync` after a restart will fail to find the entity. It also risks collisions: distinct names can hash to the same 31-bit value, and `GetExternalSystemByIdAsync` would then return the wrong row. **Recommendation** Use a deterministic, collision-resistant hash (e.g. a stable FNV-1a or the first bytes of a SHA-256 of the name) if a synthetic integer ID is genuinely required, or better, change the repository contract to key these site-local artifacts by name rather than synthesising integer IDs. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — both repositories used `name.GetHashCode()`, which is per-process randomized on .NET Core. A new internal `SyntheticId` helper computes a deterministic, process-stable 31-bit ID using the FNV-1a hash over the name's UTF-8 bytes. Both `GenerateSyntheticId` methods now delegate to `SyntheticId.From(name)`. (The integer-keyed lookups are kept because they are mandated by the shared `IExternalSystemRepository`/`INotificationRepository` contracts in Commons — changing those contracts to name-keyed would be a cross-module change outside this module's scope; the deterministic hash resolves the correctness defect within scope.) Regression tests: `SiteRepositoryTests.ExternalSystemRepository_SyntheticId_IsStableAcrossRestart` and `NotificationRepository_SyntheticId_IsStableAcrossRestart` re-create the service to simulate a process restart and confirm by-ID lookups still resolve. ### SiteRuntime-008 — Blocking `.GetAwaiter().GetResult()` on the actor thread during startup | | | |--|--| | Severity | Medium | | Category | Akka.NET conventions | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs` (`HandleStartupConfigsLoaded`, `LoadSharedScriptsFromStorage`, `HandleSharedScriptsLoaded`) | **Description** `LoadSharedScriptsFromStorage` is called synchronously from `HandleStartupConfigsLoaded` (the actor's message handler) and performs `_storage.GetAllSharedScriptsAsync().GetAwaiter().GetResult()` followed by Roslyn compilation of every shared script. This blocks the DeploymentManager singleton's mailbox thread for the full duration of the SQLite read and all shared-script compilation. On the default dispatcher this also ties up a thread-pool thread and risks thread-pool starvation, and the singleton cannot process any other message (deployments, lifecycle commands, debug routing) until it returns. The rest of the class correctly uses `PipeTo`/`ContinueWith`. **Recommendation** Load shared scripts asynchronously and `PipeTo(Self)` an internal message, the same pattern already used for `StartupConfigsLoaded`. Perform compilation either inside the piped continuation handler (still on the actor thread but at least off the synchronous startup path) or on a dedicated background task whose result is piped back. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — the blocking `.GetAwaiter().GetResult()` and Roslyn compilation ran on the singleton's mailbox thread inside `HandleStartupConfigsLoaded`. `LoadSharedScriptsFromStorage` now runs the SQLite read **and** the Roslyn compilation on a background `Task.Run` and pipes a new internal `SharedScriptsLoaded` message back to the actor. A new `HandleSharedScriptsLoaded` handler then begins staggered Instance Actor creation, so the compilation→creation ordering is preserved without ever blocking the mailbox. A shared-script load failure is logged and startup proceeds (scripts needing a missing shared script fail at execution time). Regression test: `DeploymentManagerMediumFindingsTests.Startup_WithSharedScripts_LoadsConfigsAndStaysResponsive` (confirms startup completes and the actor stays responsive with shared scripts present). ### SiteRuntime-009 — Script execution actors run scripts on the default thread pool, not a dedicated dispatcher | | | |--|--| | Severity | Medium | | Category | Akka.NET conventions | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs`, `src/ScadaLink.SiteRuntime/Actors/AlarmExecutionActor.cs`, `src/ScadaLink.SiteRuntime/Scripts/ScriptExecutionScheduler.cs` | **Description** The design (CLAUDE.md "Architecture & Runtime") states Script Execution Actors run on a *dedicated blocking I/O dispatcher*. The code does not do this: `ScriptActor.SpawnExecution` and `AlarmActor.SpawnAlarmExecution` create the execution actors with no `.WithDispatcher(...)`, and the execution itself runs inside a bare `Task.Run`, i.e. on the shared .NET thread pool. The `// NOTE: In production, configure a dedicated ... dispatcher` comments acknowledge the gap but it ships unconfigured. Scripts can perform synchronous blocking I/O (`Database.Connection`, synchronous `ExternalSystem.Call`); running them on the shared pool can starve it and stall unrelated Akka dispatchers and HTTP request handling under load. **Recommendation** Define the dedicated dispatcher in HOCON and chain `.WithDispatcher(...)` on the execution actor `Props`. If the `Task.Run` model is kept, run script bodies on a dedicated `TaskScheduler` / bounded scheduler rather than the global pool. Either way, remove the "in production, configure…" comments by actually configuring it. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — script and alarm on-trigger bodies ran inside a bare `Task.Run` on the shared `ThreadPool`. The recommendation's `TaskScheduler` option was taken because it is fully in-module (a HOCON dispatcher would require editing the Host's ActorSystem config, out of scope). A new `ScriptExecutionScheduler` provides a bounded set of dedicated background threads (count from the new `SiteRuntimeOptions.ScriptExecutionThreadCount`, default 8). `ScriptExecutionActor` and `AlarmExecutionActor` now run their bodies via `Task.Factory.StartNew(..., ScriptExecutionScheduler.Shared(options)).Unwrap()` instead of `Task.Run`, so blocking script I/O is contained to those dedicated threads and cannot starve the global pool. The misleading "in production, configure a dedicated dispatcher" comments were removed. Regression tests: `ScriptExecutionSchedulerTests` (`Scheduler_RunsWork_OffTheThreadPool`, `Scheduler_RespectsConfiguredThreadCount`, `Scheduler_Shared_ReturnsSameInstanceForOptions`). ### SiteRuntime-010 — `EnsureDclConnections` never updates a connection whose configuration changed | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs` (`EnsureDclConnections`, `ComputeConnectionConfigHash`) | **Description** `EnsureDclConnections` tracks created connections in `_createdConnections` and skips any name already present (`if (_createdConnections.Contains(name)) continue;`). The skip is purely name-based: if a redeployment (or an artifact deployment) changes the endpoint, credentials, backup endpoint, or `FailoverRetryCount` of an existing connection, the new configuration is silently ignored and the DCL keeps using the stale `CreateConnectionCommand`. There is no `UpdateConnectionCommand` path. The design states that after artifact deployment the site is fully self-contained with current configuration; this caching breaks that for connection changes. **Recommendation** Compare the incoming connection config against the last one sent and re-issue a create/update command when it differs, or have the DCL treat `CreateConnectionCommand` as idempotent upsert and always forward it. Key the cache on a config hash, not just the name. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — the cache was a name-only `HashSet`, so a changed connection config was silently dropped. `_createdConnections` is now a `Dictionary` mapping connection name to a SHA-256 hash of its protocol/primary-config/backup-config/failover-retry-count (`ComputeConnectionConfigHash`). A connection whose hash is unchanged is still skipped; a connection whose config changed re-issues a `CreateConnectionCommand` so the DCL adopts the new configuration. Regression tests: `DeploymentManagerMediumFindingsTests.EnsureDclConnections_ConnectionConfigChanged_ReissuesCreateCommand` and `EnsureDclConnections_UnchangedConfig_DoesNotReissueCreateCommand`. ### SiteRuntime-011 — Trust-model validation is a substring scan and is both over- and under-inclusive | | | |--|--| | Severity | Medium | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptCompilationService.cs` (`ValidateTrustModel`) | **Description** `ValidateTrustModel` enforces the script trust model by doing raw `string.Contains` / `IndexOf` on the script source text for forbidden namespace strings. This is unreliable in both directions: - **Bypass (under-inclusive):** the check looks only for the literal namespace strings. A script can reach forbidden APIs without ever writing `System.IO` etc. — e.g. via fully-qualified type use through aliasing, `global::`-prefixed names, or simply because the namespace is already imported transitively. The compilation references include `typeof(object).Assembly` (the whole of `System.Private.CoreLib`, which contains `System.IO.File`, `System.Threading.Thread`, `System.Reflection`, etc.), so forbidden types are fully resolvable at compile time and the only barrier is this textual scan. - **False positives (over-inclusive):** any occurrence of the substring in a comment, string literal, or an unrelated identifier (e.g. a variable named `ProcessThreading`) triggers a violation; the `AllowedExceptions` logic only rescues exact prefixes. - The dead `isAllowed` variable at line 64 is computed and never used. **Recommendation** Enforce the trust model with a Roslyn `SyntaxWalker`/semantic analysis (inspect resolved symbols and their containing namespaces/assemblies), or restrict the compilation's metadata references and `AssemblyLoadContext` so forbidden types are genuinely unavailable, rather than relying on source-text matching. Remove the unused `isAllowed` variable. **Resolution** Resolved 2026-05-16 (`commit pending`): root cause confirmed — `ValidateTrustModel` was a raw `string.Contains`/`IndexOf` scan of the source text, with a dead `isAllowed` variable. It is now Roslyn semantic analysis: the script is parsed and a `CSharpCompilation` + `SemanticModel` are built; every name/member/object-creation node is resolved to its symbol and the symbol's containing namespace and fully-qualified containing type are checked against the forbidden roots. Bare namespace symbols are ignored (so the `System.Threading` qualifier of the allowed `System.Threading.Tasks.Task` no longer false-positives). A name that cannot be resolved (a type from an assembly deliberately absent from the script's references) falls back to a syntactic fully-qualified-name check, so e.g. `System.Net.Http` references are still rejected. The dead `isAllowed` variable was removed. This fixes both the bypass (`global::`/alias-qualified forbidden types) and the false positives (forbidden namespace string in a comment, string literal, or unrelated identifier). Regression tests: new `TrustModelSemanticTests` (alias/`global::` detection, comment/ literal/identifier non-detection, allowed-exception resolution); all 39 existing `SandboxTests` + `ScriptCompilationServiceTests` continue to pass. ### SiteRuntime-012 — `AttributeAccessor`/`ScopeAccessors` block the script on a synchronous Ask | | | |--|--| | Severity | Low | | Category | Concurrency & thread safety | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Scripts/ScopeAccessors.cs:28` | **Description** `AttributeAccessor`'s indexer getter calls `_ctx.GetAttribute(...).GetAwaiter().GetResult()`, synchronously blocking the script-execution thread on an actor Ask. Combined with SiteRuntime-009 (scripts run on the shared thread pool) this means a script that reads several attributes via `Attributes["X"]` holds a pool thread blocked for each round-trip. The async variants (`GetAsync`/`SetAsync`) exist but the ergonomic indexer encourages the blocking path. The XML comment notes "Reads block on the actor Ask" but does not warn about the thread-pool impact. **Recommendation** Once a dedicated script dispatcher exists (SiteRuntime-009) the blocking is contained to that pool, which is acceptable; until then, document the cost clearly and prefer steering script authors to the async accessors. Consider making the indexer internal-only and exposing only the async API. **Resolution** Resolved 2026-05-16 (`pending commit`): re-triaged against the current source — the finding's own recommendation states the blocking is *acceptable* once SiteRuntime-009's dedicated script dispatcher exists, and SiteRuntime-009 is now Resolved (`ScriptExecutionActor`/`AlarmExecutionActor` run script bodies on the dedicated `ScriptExecutionScheduler` threads, confirmed in `ScriptExecutionActor.cs:74`). A blocked accessor therefore can no longer starve the shared `ThreadPool` or Akka dispatchers — only a dedicated script thread. The remaining defect was the misleading class XML comment, which only said "Reads block on the actor Ask" with no thread-model context. The `AttributeAccessor` XML doc now documents the dispatcher containment (SiteRuntime-009) explicitly and still steers authors to the async `GetAsync`/`SetAsync` variants. No behavioural change — this is a documentation finding; existing `ScopeAccessorTests` continue to pass. ### SiteRuntime-013 — `HandleUnsubscribeDebugView` does nothing despite documented behaviour | | | |--|--| | Severity | Low | | Category | Documentation & comments | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:414` | **Description** `HandleUnsubscribeDebugView` is documented ("Debug view unsubscribe — removes subscription") and the actor registers a handler for `UnsubscribeDebugViewRequest`, but the body only logs a debug message — there is no subscription state in the Instance Actor to remove. The design places the actual subscription lifecycle in `SiteStreamManager` (`Subscribe`/`Unsubscribe`/`RemoveSubscriber`), so the Instance Actor genuinely has nothing to do here. The handler and its XML comment are therefore misleading: a reader expects it to tear down a subscription. **Recommendation** Either remove the no-op handler and route `UnsubscribeDebugViewRequest` to wherever the `SiteStreamManager` subscription is actually cancelled, or correct the XML comment to state explicitly that subscription teardown is handled by `SiteStreamManager` and this handler is a no-op acknowledgement. **Resolution** Resolved 2026-05-16 (`pending commit`): root cause confirmed — the Instance Actor holds no per-subscriber state, so `HandleUnsubscribeDebugView` genuinely has nothing to remove; the real debug-stream subscription lifecycle lives in `SiteStreamManager` (Subscribe/Unsubscribe/RemoveSubscriber). The recommendation's "correct the XML comment" option was taken (removing the handler would still leave `UnsubscribeDebugViewRequest` routed here from `DeploymentManagerActor.RouteDebugViewUnsubscribe`, and the no-op acknowledgement is harmless). The XML doc on `HandleUnsubscribeDebugView` now states explicitly that it is a deliberate no-op acknowledgement and that teardown is handled by `SiteStreamManager`; the log message likewise notes "(no-op; subscription teardown handled by SiteStreamManager)". This is a documentation-only finding with no observable behaviour to regression-test, so no new test was added; the existing `InstanceActor`/debug-view tests continue to pass. ### SiteRuntime-014 — Trigger-expression evaluation blocks the coordinator actor thread | | | |--|--| | Severity | Low | | Category | Akka.NET conventions | | Status | Deferred | | Location | `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:219`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:389` | **Description** `EvaluateExpressionTrigger` (ScriptActor) and `EvaluateExpression` (AlarmActor) run a compiled Roslyn script with `.RunAsync(...).GetAwaiter().GetResult()` directly inside the actor's `AttributeValueChanged` message handler. This blocks the coordinator actor's mailbox thread for up to the 2-second timeout on every monitored attribute change. Coordinator actors are on the default dispatcher and process the hot path of attribute-change fan-out; a slow expression delays all other messages to that actor and consumes a thread-pool thread for the duration. The inline comments correctly note CPU-bound expressions are not interruptible but do not address the mailbox-blocking concern. **Recommendation** Trigger expressions are expected to be cheap, but to keep the actor responsive consider evaluating them off the actor thread (pipe the boolean result back as an internal message) or pre-compiling to a plain delegate that executes near-instantly without the Roslyn scripting `RunAsync` machinery. **Resolution** Deferred 2026-05-16 (`pending commit`): root cause confirmed — `EvaluateExpressionTrigger` (ScriptActor) and `EvaluateExpression` (AlarmActor) call `_compiledTriggerExpression.RunAsync(...).GetAwaiter().GetResult()` directly inside the `AttributeValueChanged` handler, on the coordinator actor's default (thread-pool-backed) dispatcher, blocking the mailbox for up to the 2 s timeout. Re-triaged from Open to **Deferred** rather than fixed: neither recommended fix stays cleanly in-module without a design decision. (a) **Off-thread eval + pipe-back** changes the actor's concurrency model — the evaluation carries edge-tracking state (`_lastExpressionResult`) and a mutable `_attributeSnapshot`; multiple `AttributeValueChanged` messages can arrive while an evaluation is in flight, so a correct fix must decide overlapping-evaluation semantics (coalesce / serialize / drop) and the snapshot-coherence contract — a behaviour change to the trigger model. (b) **Pre-compile to a plain delegate** would require changing the compilation contract: the trigger expression is produced as a Roslyn `Script` by `ScriptCompilationService.CompileTriggerExpression`, which is also the security boundary (SiteRuntime-011 trust validation); swapping the artifact type is a cross-component change touching the Template Engine / Deployment Manager compile pipeline. Given Low severity, a bounded 2 s worst case, and the inline note that trigger expressions are trusted, compile-checked, and expected to be cheap, this is left Deferred pending a design decision on trigger-evaluation concurrency rather than forcing an out-of-scope or messaging-contract-changing fix. ### SiteRuntime-015 — `LoggerFactory` created per Instance Actor and never disposed | | | |--|--| | Severity | Low | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:746` | **Description** `CreateInstanceActor` does `var loggerFactory = new LoggerFactory();` for every Instance Actor it creates, uses it once to produce an `ILogger`, and never disposes it. `LoggerFactory` is `IDisposable`. With up to 500 instances (and churn from redeployments) this leaks a factory per instance, and the produced loggers are detached from the application's configured logging providers, so Instance Actor logs may not be routed/filtered consistently with the rest of the host. **Recommendation** Inject the application's `ILoggerFactory` (or an `ILogger` factory delegate) into `DeploymentManagerActor` via DI and reuse it, rather than newing one up per child. Do not create a fresh `LoggerFactory` in a hot creation path. **Resolution** Resolved 2026-05-16 (`pending commit`): root cause confirmed — `CreateInstanceActor` did `new LoggerFactory()` per Instance Actor, never disposed, and detached from the host's logging providers. `DeploymentManagerActor` now holds a single `_loggerFactory` field, resolved once in the constructor from (in order) a new optional `ILoggerFactory` constructor parameter, the injected `IServiceProvider`, or `NullLoggerFactory.Instance` as a last resort — never a per-instance allocation. `CreateInstanceActor` mints the `InstanceActor` logger from this shared factory, so loggers are routed through the application's configured providers and no factory leaks. Regression test: `DeploymentManagerLoggerFactoryTests.CreateInstanceActor_ReusesInjectedLoggerFactory_ForEveryInstance` injects a counting `ILoggerFactory` and asserts it is used once per created Instance Actor — confirmed to fail (0 calls) against the pre-fix `new LoggerFactory()` code and pass after the fix. ### SiteRuntime-016 — Short-lived execution actors, replication actor, and repositories are untested | | | |--|--| | Severity | Low | | Category | Testing coverage | | Status | Resolved | | Location | `tests/ScadaLink.SiteRuntime.Tests/` | **Description** The test project covers the coordinator actors (`InstanceActor`, `ScriptActor`, `AlarmActor`, `DeploymentManagerActor`), persistence, scripting and streaming, but a search of the test sources finds no references to `ScriptExecutionActor`, `AlarmExecutionActor`, `SiteReplicationActor`, `SiteExternalSystemRepository`, or `SiteNotificationRepository`. These cover critical paths: script timeout/failure handling and result reply, alarm on-trigger execution, peer config/S&F replication (including the `SendToPeer` no-peer drop), and the reflection-based repository reads. Several findings above (001/002 mis-routing, 007 ID instability, 011 trust bypass) would likely have been caught by targeted tests. **Recommendation** Add unit/integration tests for the execution actors (success, timeout, exception, Ask-reply, PoisonPill self-stop), `SiteReplicationActor` (outbound forward, inbound apply, peer tracking on cluster events), and the two repositories (round-trip read, synthetic-ID lookup, missing-row behaviour). **Resolution** Resolved 2026-05-16 (`pending commit`): re-triaged against the current test sources — `SiteExternalSystemRepository` and `SiteNotificationRepository` are already covered by `Repositories/SiteRepositoryTests.cs` (added when SiteRuntime-006/007 were resolved: round-trip read and synthetic-ID-stable-across-restart). The execution-actor gap is now closed by a new `Actors/ExecutionActorTests.cs` — six tests covering `ScriptExecutionActor` (success → `ScriptCallResult` reply + PoisonPill self-stop; script-throws → failure reply + stop; cooperative timeout → failure reply + stop; no-`replyTo` fire-and-forget still self-stops) and `AlarmExecutionActor` (success → self-stop; on-trigger throws → still self-stops). `SiteReplicationActor` is *not* covered here: it depends on `Cluster.Get(Context.System)` and so requires a clustered `ActorSystem` HOCON harness that does not yet exist in this test project — adding that harness is a larger test-infrastructure task tracked separately and out of scope for a Low-severity coverage finding; the highest-value untested paths the finding called out (script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests green.