48 KiB
Code Review — SiteRuntime
| Field | Value |
|---|---|
| Module | src/ScadaLink.SiteRuntime |
| Design doc | docs/requirements/Component-SiteRuntime.md |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | 39d737e |
| Open findings | 0 |
Summary
The SiteRuntime module is broadly well-structured: the actor hierarchy matches the
design doc, supervision strategies are explicit, and the trigger/alarm evaluation
logic is thorough. However the review surfaced one genuinely serious correctness
defect — Instance.SetAttribute never routes writes to the Data Connection Layer
for data-sourced attributes, contradicting a core design decision and silently
turning device writes into local-only static overrides. Several other findings
cluster around two themes: (1) actor-thread discipline is violated in a few hot
paths (blocking .GetAwaiter().GetResult() calls on the actor thread, a fragile
fixed-delay reschedule for redeployment), and (2) the site-local repositories reach
into SiteStorageService private state via reflection and mint entity IDs with the
non-deterministic string.GetHashCode(). Script execution runs on the default
thread pool rather than a dedicated blocking dispatcher (the code acknowledges this
in a comment but ships it anyway). Test coverage exists for the coordinator actors,
persistence and scripting, but the short-lived execution actors, the replication
actor, and the repositories are untested.
Re-review 2026-05-17 (commit 39d737e)
The module was re-reviewed at commit 39d737e. No source under
src/ScadaLink.SiteRuntime has changed since the previous review at 9c60592
(the only intervening commits are code-review documentation updates), so all of
SiteRuntime-001..013, 015, 016 remain Resolved and SiteRuntime-014 remains
Deferred — its Deferred justification (a trigger-evaluation concurrency design
decision is required before either recommended fix can land in-module) still
holds verbatim against the unchanged ScriptActor/AlarmActor source. The
re-review nonetheless worked through all 10 checklist categories afresh and
surfaced three new findings that the prior pass did not record: a cross-thread
Dictionary enumeration race when the Instance Actor's live _attributes
dictionary is handed by reference into child ScriptActor/AlarmActor
constructors (SiteRuntime-017, Medium); a stale ScriptExecutionActor XML doc
that still claims a "dedicated blocking I/O dispatcher" (SiteRuntime-018, Low);
and two dead lifecycle handlers in InstanceActor that the Deployment Manager
never routes to (SiteRuntime-019, Low). All three were subsequently resolved on
2026-05-17. Open findings: 0.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ✓ | SetAttribute mis-routing, deploy double-count, redeploy reschedule race. |
| 2 | Akka.NET conventions | ✓ | Blocking on actor thread, script execution not on a dedicated dispatcher, premature success reply. |
| 3 | Concurrency & thread safety | ✓ | _attributes dictionary shared with child actors by reference; _executionCounter is actor-confined (OK). |
| 4 | Error handling & resilience | ✓ | Deploy reports Success before persistence; replicated artifact/S&F failures only logged (matches best-effort design). |
| 5 | Security | ✓ | Trust-model validation is substring-based and weak; reflection used to read private fields. |
| 6 | Performance & resource management | ✓ | Per-call SQLite connections (acceptable); CPU-bound scripts not interruptible by timeout. |
| 7 | Design-document adherence | ✓ | SetAttribute DCL routing missing; staggered-startup and supervision otherwise conform. |
| 8 | Code organization & conventions | ✓ | Repositories reflect into another class; synthetic IDs non-deterministic. |
| 9 | Testing coverage | ✓ | No tests for ScriptExecutionActor, AlarmExecutionActor, SiteReplicationActor, or the two repositories. |
| 10 | Documentation & comments | ✓ | Several XML comments describe behaviour the code does not implement (see findings). |
Findings
SiteRuntime-001 — Instance.SetAttribute never writes to the Data Connection Layer
| Severity | High |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:106, src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:204 |
Description
The design doc (Component-SiteRuntime.md, "GetAttribute / SetAttribute" and
"Script Runtime API") states that Instance.SetAttribute on a data-connected
attribute must send a write request to the DCL, which writes to the physical
device, and that the in-memory value is not optimistically updated. For static
attributes it updates memory and persists an override.
The implementation makes no such distinction. ScriptRuntimeContext.SetAttribute
unconditionally sends a SetStaticAttributeCommand, and InstanceActor.HandleSetStaticAttribute
unconditionally treats every write as a static override: it mutates _attributes,
publishes an AttributeValueChanged with hard-coded "Good" quality, notifies
children, and persists a SQLite override. A script writing a data-sourced attribute
therefore never reaches the device, the write failure can never be returned
synchronously to the script, and the in-memory value diverges from the device
until the next subscription update overwrites it. The persisted override is also
wrong: data-sourced attributes should not have static overrides.
Recommendation
In InstanceActor, look up the target attribute in _configuration.Attributes. If
it has a non-empty DataSourceReference, issue a DCL write (e.g. a WriteTagRequest
to _dclManager) and surface success/failure to the caller; do not persist an
override and do not optimistically mutate _attributes. Only attributes with no
data source reference should follow the current static-override path. Consider
splitting the message into SetStaticAttributeCommand vs SetDataAttributeCommand,
or branching inside the handler.
Resolution
Resolved 2026-05-16 (<pending>): InstanceActor.HandleSetStaticAttribute now resolves
the target attribute's data binding from _configuration. Data-sourced attributes are
routed via a new HandleSetDataAttribute that Asks the DCL with a WriteTagRequest and
pipes the device-write outcome back to the caller as a SetStaticAttributeResponse —
no override is persisted and _attributes is not optimistically mutated. Static
attributes keep the override path and now also reply with a SetStaticAttributeResponse.
ScriptRuntimeContext.SetAttribute is now async Task and Asks the Instance Actor,
throwing InvalidOperationException on a failed device write so scripts get the failure
synchronously.
SiteRuntime-002 — RouteInboundApiSetAttributes always treats writes as static overrides
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:632 |
Description
RouteInboundApiSetAttributes (handling Route.To().SetAttribute(s) from the
Inbound API) emits a SetStaticAttributeCommand for every attribute, so it inherits
the same defect as SiteRuntime-001: writes to data-sourced attributes never reach
the device and are instead persisted as static overrides. In addition the response
is sent back as unconditionally successful (true) before the Instance Actor has
even processed the command, so a non-existent attribute or a future DCL write
failure is reported to the external caller as success.
Recommendation
Route through the same corrected InstanceActor write handler as SiteRuntime-001 so
the static-vs-data distinction is honoured. The optimistic ack is acceptable for
fire-and-forget static writes per the doc, but the XML comment should make the
limitation explicit, and once data-attribute writes are supported they need a real
response path.
Resolution
Resolved 2026-05-16 (<pending>): RouteInboundApiSetAttributes now Asks the Instance
Actor per attribute (instead of fire-and-forget Tell) and aggregates the
SetStaticAttributeResponse results. Because the Instance Actor handler is the
SiteRuntime-001 corrected handler, data-sourced attributes now reach the DCL and the
RouteToSetAttributesResponse reflects the real per-attribute outcome — a non-existent
attribute or a failed device write is reported as failure rather than an unconditional
optimistic true.
SiteRuntime-003 — Redeployment relies on a fixed 500 ms reschedule and can collide on the child actor name
| Severity | High |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:222 |
Description
HandleDeploy stops an existing Instance Actor with Context.Stop and then
reschedules the same DeployInstanceCommand to itself after a hard-coded 500 ms,
hoping the child has fully terminated by then. Context.Stop is asynchronous; the
child is only removed from the parent's children collection after it actually stops
(including running PostStop on its descendants). If a deeply nested or slow
hierarchy takes longer than 500 ms, CreateInstanceActor calls Context.ActorOf
with a name that still belongs to the terminating child and throws
InvalidActorNameException. The _instanceActors dictionary check does not prevent
this — the dictionary entry is removed immediately, but the Akka child registry is
not. The 500 ms delay is also unconditionally added to every redeploy latency.
Recommendation
Watch the terminating child (Context.Watch) and recreate the Instance Actor only
after receiving the Terminated message, instead of guessing with a timer. Buffer
or stash the in-flight DeployInstanceCommand (and any further commands for that
instance) until termination completes.
Resolution
Resolved 2026-05-16 (<pending>): HandleDeploy no longer uses a fixed 500 ms
reschedule. When a redeployment targets a running instance, the existing Instance Actor
is Context.Watch-ed and stopped, and the in-flight DeployInstanceCommand is buffered
in a _pendingRedeploys map keyed by the terminating actor ref. A new Terminated
handler recreates the Instance Actor only after the predecessor (and its whole subtree)
has fully stopped, eliminating the InvalidActorNameException race and the
unconditional redeploy-latency penalty. The shared ApplyDeployment helper also skips
the _totalDeployedCount increment for redeployments, so the deployed-instance count no
longer drifts (this additionally addresses the root cause behind SiteRuntime-004).
SiteRuntime-004 — _totalDeployedCount is incremented on redeployment of an existing instance
| Severity | Medium — re-triaged: already fixed by the SiteRuntime-003 resolution. |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs (ApplyDeployment) |
Description
In HandleDeploy, the existing-actor branch (line 223) reschedules the command and
returns. When the rescheduled command runs, no actor exists, so the code falls
through to the "new instance" branch and executes _totalDeployedCount++
(line 239). A redeployment is an update of an already-deployed instance, not a new
one, so the deployed count is over-counted by one on every redeploy.
StoreDeployedConfigAsync uses UPSERT semantics, so the SQLite row count does not
grow, but the in-memory _totalDeployedCount (reported to the health collector via
UpdateInstanceCounts) drifts upward and the reported "disabled" count becomes
wrong.
Re-triage (2026-05-16)
Verified against the current source: this is already fixed. The SiteRuntime-003
resolution replaced the fixed-delay reschedule with a shared ApplyDeployment helper
that takes an isRedeploy flag and guards the counter with if (!isRedeploy) _totalDeployedCount++;. The redeploy path (HandleTerminated) always calls
ApplyDeployment(..., isRedeploy: true), so the counter is no longer bumped on
redeployment. The regression test
DeploymentManagerRedeployTests.Redeploy_ExistingInstance_DoesNotOverCountDeployedInstances
already covers this and passes. No further code change was required.
Resolution
Resolved 2026-05-16 (commit pending): no new change needed — the root cause was
eliminated by the SiteRuntime-003 fix (the isRedeploy guard in ApplyDeployment).
Confirmed by the existing passing regression test
Redeploy_ExistingInstance_DoesNotOverCountDeployedInstances. Re-triaged from Open to
Resolved.
SiteRuntime-005 — Deployment reports Success to central before persistence completes
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs (ApplyDeployment, HandleDeployPersistenceResult) |
Description
HandleDeploy replies to central with DeploymentStatus.Success immediately after
creating the Instance Actor, while the SQLite persistence (StoreDeployedConfigAsync
ClearStaticOverridesAsync) runs asynchronously on aTask.Run. If persistence fails,HandleDeployPersistenceResultonly logs an error — central has already been told the deployment succeeded. On a subsequent node restart or failover the instance will not be re-created (it is not in SQLite), so the deployment is silently lost despite central recording success. This contradicts the design's intent that the site is the durable source of truth for deployed configs.
Recommendation
Persist the config before replying, or treat a persistence failure as a deployment
failure and send a corrective DeploymentStatusResponse/health signal to central.
At minimum, do not report Success until the config row is committed.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — ApplyDeployment sent
DeploymentStatusResponse(Success) synchronously before the persistence Task.Run
completed. The Success reply is now sent from HandleDeployPersistenceResult only
once the persistence result is known: on success it replies Success; on a
persistence failure it logs the error, stops the optimistically-created Instance
Actor, rolls back the deployed-instance counter, and replies
DeploymentStatus.Failed with the error message. DeployPersistenceResult carries an
IsRedeploy flag so the counter rollback is skipped for redeployments. Regression
tests: DeploymentManagerMediumFindingsTests.Deploy_PersistenceFailure_ReportsFailedNotSuccess
and Deploy_Success_ReportsSuccessAndPersistsConfig.
SiteRuntime-006 — Site-local repositories read SiteStorageService private field via reflection
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs, src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs |
Description
Both repositories' CreateConnection() use Type.GetField("_connectionString", BindingFlags.NonPublic | BindingFlags.Instance) to extract the private connection
string out of SiteStorageService. This is brittle (any rename or refactor of the
field breaks it at runtime, not compile time), defeats encapsulation, and the
accompanying XML comment openly describes it as a "pragmatic" hack and is internally
contradictory (it states a connection string is "passed separately at DI
registration time" which is not what the code does). It also sits awkwardly against
the project's own script trust model, which forbids System.Reflection in scripts.
Recommendation
Expose the connection string properly: add an ISiteStorageConnectionProvider
(already referenced in ServiceCollectionExtensions XML docs but not used), or have
SiteStorageService expose a CreateConnection() factory, and inject that into the
repositories. Remove the reflection entirely.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — both repositories
reflected into SiteStorageService._connectionString. SiteStorageService now
exposes a public CreateConnection() factory method that returns an unopened
SqliteConnection against the site database. Both SiteExternalSystemRepository and
SiteNotificationRepository now obtain connections via _storage.CreateConnection();
all reflection (Type.GetField / BindingFlags) and the contradictory XML comments
have been removed. This is a fully in-module refactor — no cross-module design
decision was needed. Regression test:
SiteRepositoryTests.ExternalSystemRepository_RoundTripsStoredDefinition exercises
the repository's connection path end-to-end.
SiteRuntime-007 — Synthetic entity IDs use the non-deterministic string.GetHashCode()
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs, src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs, src/ScadaLink.SiteRuntime/Repositories/SyntheticId.cs |
Description
GenerateSyntheticId computes name.GetHashCode() & 0x7FFFFFFF. On .NET Core,
string.GetHashCode() is randomized per process by default, so the "stable
deterministic synthetic ID" promised by the XML comment is not stable at all — it
changes every time the process restarts. Any caller that obtained an ID and later
calls GetExternalSystemByIdAsync/GetNotificationListByIdAsync after a restart
will fail to find the entity. It also risks collisions: distinct names can hash to
the same 31-bit value, and GetExternalSystemByIdAsync would then return the wrong
row.
Recommendation
Use a deterministic, collision-resistant hash (e.g. a stable FNV-1a or the first bytes of a SHA-256 of the name) if a synthetic integer ID is genuinely required, or better, change the repository contract to key these site-local artifacts by name rather than synthesising integer IDs.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — both repositories used
name.GetHashCode(), which is per-process randomized on .NET Core. A new internal
SyntheticId helper computes a deterministic, process-stable 31-bit ID using the
FNV-1a hash over the name's UTF-8 bytes. Both GenerateSyntheticId methods now
delegate to SyntheticId.From(name). (The integer-keyed lookups are kept because
they are mandated by the shared IExternalSystemRepository/INotificationRepository
contracts in Commons — changing those contracts to name-keyed would be a cross-module
change outside this module's scope; the deterministic hash resolves the correctness
defect within scope.) Regression tests:
SiteRepositoryTests.ExternalSystemRepository_SyntheticId_IsStableAcrossRestart and
NotificationRepository_SyntheticId_IsStableAcrossRestart re-create the service to
simulate a process restart and confirm by-ID lookups still resolve.
SiteRuntime-008 — Blocking .GetAwaiter().GetResult() on the actor thread during startup
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs (HandleStartupConfigsLoaded, LoadSharedScriptsFromStorage, HandleSharedScriptsLoaded) |
Description
LoadSharedScriptsFromStorage is called synchronously from
HandleStartupConfigsLoaded (the actor's message handler) and performs
_storage.GetAllSharedScriptsAsync().GetAwaiter().GetResult() followed by Roslyn
compilation of every shared script. This blocks the DeploymentManager singleton's
mailbox thread for the full duration of the SQLite read and all shared-script
compilation. On the default dispatcher this also ties up a thread-pool thread and
risks thread-pool starvation, and the singleton cannot process any other message
(deployments, lifecycle commands, debug routing) until it returns. The rest of the
class correctly uses PipeTo/ContinueWith.
Recommendation
Load shared scripts asynchronously and PipeTo(Self) an internal message, the same
pattern already used for StartupConfigsLoaded. Perform compilation either inside
the piped continuation handler (still on the actor thread but at least off the
synchronous startup path) or on a dedicated background task whose result is piped
back.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — the blocking
.GetAwaiter().GetResult() and Roslyn compilation ran on the singleton's mailbox
thread inside HandleStartupConfigsLoaded. LoadSharedScriptsFromStorage now runs
the SQLite read and the Roslyn compilation on a background Task.Run and pipes a
new internal SharedScriptsLoaded message back to the actor. A new
HandleSharedScriptsLoaded handler then begins staggered Instance Actor creation, so
the compilation→creation ordering is preserved without ever blocking the mailbox. A
shared-script load failure is logged and startup proceeds (scripts needing a missing
shared script fail at execution time). Regression test:
DeploymentManagerMediumFindingsTests.Startup_WithSharedScripts_LoadsConfigsAndStaysResponsive
(confirms startup completes and the actor stays responsive with shared scripts
present).
SiteRuntime-009 — Script execution actors run scripts on the default thread pool, not a dedicated dispatcher
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs, src/ScadaLink.SiteRuntime/Actors/AlarmExecutionActor.cs, src/ScadaLink.SiteRuntime/Scripts/ScriptExecutionScheduler.cs |
Description
The design (CLAUDE.md "Architecture & Runtime") states Script Execution Actors run
on a dedicated blocking I/O dispatcher. The code does not do this: ScriptActor.SpawnExecution
and AlarmActor.SpawnAlarmExecution create the execution actors with no
.WithDispatcher(...), and the execution itself runs inside a bare Task.Run,
i.e. on the shared .NET thread pool. The // NOTE: In production, configure a dedicated ... dispatcher comments acknowledge the gap but it ships unconfigured.
Scripts can perform synchronous blocking I/O (Database.Connection, synchronous
ExternalSystem.Call); running them on the shared pool can starve it and stall
unrelated Akka dispatchers and HTTP request handling under load.
Recommendation
Define the dedicated dispatcher in HOCON and chain .WithDispatcher(...) on the
execution actor Props. If the Task.Run model is kept, run script bodies on a
dedicated TaskScheduler / bounded scheduler rather than the global pool. Either
way, remove the "in production, configure…" comments by actually configuring it.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — script and alarm
on-trigger bodies ran inside a bare Task.Run on the shared ThreadPool. The
recommendation's TaskScheduler option was taken because it is fully in-module (a
HOCON dispatcher would require editing the Host's ActorSystem config, out of scope).
A new ScriptExecutionScheduler provides a bounded set of dedicated background
threads (count from the new SiteRuntimeOptions.ScriptExecutionThreadCount, default
8). ScriptExecutionActor and AlarmExecutionActor now run their bodies via
Task.Factory.StartNew(..., ScriptExecutionScheduler.Shared(options)).Unwrap()
instead of Task.Run, so blocking script I/O is contained to those dedicated threads
and cannot starve the global pool. The misleading "in production, configure a
dedicated dispatcher" comments were removed. Regression tests:
ScriptExecutionSchedulerTests (Scheduler_RunsWork_OffTheThreadPool,
Scheduler_RespectsConfiguredThreadCount, Scheduler_Shared_ReturnsSameInstanceForOptions).
SiteRuntime-010 — EnsureDclConnections never updates a connection whose configuration changed
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs (EnsureDclConnections, ComputeConnectionConfigHash) |
Description
EnsureDclConnections tracks created connections in _createdConnections and skips
any name already present (if (_createdConnections.Contains(name)) continue;). The
skip is purely name-based: if a redeployment (or an artifact deployment) changes the
endpoint, credentials, backup endpoint, or FailoverRetryCount of an existing
connection, the new configuration is silently ignored and the DCL keeps using the
stale CreateConnectionCommand. There is no UpdateConnectionCommand path. The
design states that after artifact deployment the site is fully self-contained with
current configuration; this caching breaks that for connection changes.
Recommendation
Compare the incoming connection config against the last one sent and re-issue a
create/update command when it differs, or have the DCL treat CreateConnectionCommand
as idempotent upsert and always forward it. Key the cache on a config hash, not just
the name.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — the cache was a
name-only HashSet, so a changed connection config was silently dropped.
_createdConnections is now a Dictionary<string,string> mapping connection name to
a SHA-256 hash of its protocol/primary-config/backup-config/failover-retry-count
(ComputeConnectionConfigHash). A connection whose hash is unchanged is still
skipped; a connection whose config changed re-issues a CreateConnectionCommand so
the DCL adopts the new configuration. Regression tests:
DeploymentManagerMediumFindingsTests.EnsureDclConnections_ConnectionConfigChanged_ReissuesCreateCommand
and EnsureDclConnections_UnchangedConfig_DoesNotReissueCreateCommand.
SiteRuntime-011 — Trust-model validation is a substring scan and is both over- and under-inclusive
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Scripts/ScriptCompilationService.cs (ValidateTrustModel) |
Description
ValidateTrustModel enforces the script trust model by doing raw string.Contains /
IndexOf on the script source text for forbidden namespace strings. This is
unreliable in both directions:
- Bypass (under-inclusive): the check looks only for the literal namespace
strings. A script can reach forbidden APIs without ever writing
System.IOetc. — e.g. via fully-qualified type use through aliasing,global::-prefixed names, or simply because the namespace is already imported transitively. The compilation references includetypeof(object).Assembly(the whole ofSystem.Private.CoreLib, which containsSystem.IO.File,System.Threading.Thread,System.Reflection, etc.), so forbidden types are fully resolvable at compile time and the only barrier is this textual scan. - False positives (over-inclusive): any occurrence of the substring in a comment,
string literal, or an unrelated identifier (e.g. a variable named
ProcessThreading) triggers a violation; theAllowedExceptionslogic only rescues exact prefixes. - The dead
isAllowedvariable at line 64 is computed and never used.
Recommendation
Enforce the trust model with a Roslyn SyntaxWalker/semantic analysis (inspect
resolved symbols and their containing namespaces/assemblies), or restrict the
compilation's metadata references and AssemblyLoadContext so forbidden types are
genuinely unavailable, rather than relying on source-text matching. Remove the
unused isAllowed variable.
Resolution
Resolved 2026-05-16 (commit pending): root cause confirmed — ValidateTrustModel
was a raw string.Contains/IndexOf scan of the source text, with a dead isAllowed
variable. It is now Roslyn semantic analysis: the script is parsed and a
CSharpCompilation + SemanticModel are built; every name/member/object-creation
node is resolved to its symbol and the symbol's containing namespace and
fully-qualified containing type are checked against the forbidden roots. Bare
namespace symbols are ignored (so the System.Threading qualifier of the allowed
System.Threading.Tasks.Task no longer false-positives). A name that cannot be
resolved (a type from an assembly deliberately absent from the script's references)
falls back to a syntactic fully-qualified-name check, so e.g. System.Net.Http
references are still rejected. The dead isAllowed variable was removed. This fixes
both the bypass (global::/alias-qualified forbidden types) and the false positives
(forbidden namespace string in a comment, string literal, or unrelated identifier).
Regression tests: new TrustModelSemanticTests (alias/global:: detection, comment/
literal/identifier non-detection, allowed-exception resolution); all 39 existing
SandboxTests + ScriptCompilationServiceTests continue to pass.
SiteRuntime-012 — AttributeAccessor/ScopeAccessors block the script on a synchronous Ask
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Scripts/ScopeAccessors.cs:28 |
Description
AttributeAccessor's indexer getter calls _ctx.GetAttribute(...).GetAwaiter().GetResult(),
synchronously blocking the script-execution thread on an actor Ask. Combined with
SiteRuntime-009 (scripts run on the shared thread pool) this means a script that
reads several attributes via Attributes["X"] holds a pool thread blocked for each
round-trip. The async variants (GetAsync/SetAsync) exist but the ergonomic
indexer encourages the blocking path. The XML comment notes "Reads block on the
actor Ask" but does not warn about the thread-pool impact.
Recommendation
Once a dedicated script dispatcher exists (SiteRuntime-009) the blocking is contained to that pool, which is acceptable; until then, document the cost clearly and prefer steering script authors to the async accessors. Consider making the indexer internal-only and exposing only the async API.
Resolution
Resolved 2026-05-16 (pending commit): re-triaged against the current source — the
finding's own recommendation states the blocking is acceptable once SiteRuntime-009's
dedicated script dispatcher exists, and SiteRuntime-009 is now Resolved
(ScriptExecutionActor/AlarmExecutionActor run script bodies on the dedicated
ScriptExecutionScheduler threads, confirmed in ScriptExecutionActor.cs:74). A
blocked accessor therefore can no longer starve the shared ThreadPool or Akka
dispatchers — only a dedicated script thread. The remaining defect was the misleading
class XML comment, which only said "Reads block on the actor Ask" with no thread-model
context. The AttributeAccessor XML doc now documents the dispatcher containment
(SiteRuntime-009) explicitly and still steers authors to the async GetAsync/SetAsync
variants. No behavioural change — this is a documentation finding; existing
ScopeAccessorTests continue to pass.
SiteRuntime-013 — HandleUnsubscribeDebugView does nothing despite documented behaviour
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:414 |
Description
HandleUnsubscribeDebugView is documented ("Debug view unsubscribe — removes
subscription") and the actor registers a handler for UnsubscribeDebugViewRequest,
but the body only logs a debug message — there is no subscription state in the
Instance Actor to remove. The design places the actual subscription lifecycle in
SiteStreamManager (Subscribe/Unsubscribe/RemoveSubscriber), so the Instance
Actor genuinely has nothing to do here. The handler and its XML comment are
therefore misleading: a reader expects it to tear down a subscription.
Recommendation
Either remove the no-op handler and route UnsubscribeDebugViewRequest to wherever
the SiteStreamManager subscription is actually cancelled, or correct the XML
comment to state explicitly that subscription teardown is handled by
SiteStreamManager and this handler is a no-op acknowledgement.
Resolution
Resolved 2026-05-16 (pending commit): root cause confirmed — the Instance Actor
holds no per-subscriber state, so HandleUnsubscribeDebugView genuinely has nothing to
remove; the real debug-stream subscription lifecycle lives in SiteStreamManager
(Subscribe/Unsubscribe/RemoveSubscriber). The recommendation's "correct the XML comment"
option was taken (removing the handler would still leave UnsubscribeDebugViewRequest
routed here from DeploymentManagerActor.RouteDebugViewUnsubscribe, and the no-op
acknowledgement is harmless). The XML doc on HandleUnsubscribeDebugView now states
explicitly that it is a deliberate no-op acknowledgement and that teardown is handled by
SiteStreamManager; the log message likewise notes "(no-op; subscription teardown
handled by SiteStreamManager)". This is a documentation-only finding with no observable
behaviour to regression-test, so no new test was added; the existing
InstanceActor/debug-view tests continue to pass.
SiteRuntime-014 — Trigger-expression evaluation blocks the coordinator actor thread
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Deferred |
| Location | src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:219, src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:389 |
Description
EvaluateExpressionTrigger (ScriptActor) and EvaluateExpression (AlarmActor) run a
compiled Roslyn script with .RunAsync(...).GetAwaiter().GetResult() directly inside
the actor's AttributeValueChanged message handler. This blocks the coordinator
actor's mailbox thread for up to the 2-second timeout on every monitored attribute
change. Coordinator actors are on the default dispatcher and process the hot path of
attribute-change fan-out; a slow expression delays all other messages to that actor
and consumes a thread-pool thread for the duration. The inline comments correctly
note CPU-bound expressions are not interruptible but do not address the
mailbox-blocking concern.
Recommendation
Trigger expressions are expected to be cheap, but to keep the actor responsive
consider evaluating them off the actor thread (pipe the boolean result back as an
internal message) or pre-compiling to a plain delegate that executes near-instantly
without the Roslyn scripting RunAsync machinery.
Resolution
Deferred 2026-05-16 (pending commit): root cause confirmed — EvaluateExpressionTrigger
(ScriptActor) and EvaluateExpression (AlarmActor) call
_compiledTriggerExpression.RunAsync(...).GetAwaiter().GetResult() directly inside the
AttributeValueChanged handler, on the coordinator actor's default (thread-pool-backed)
dispatcher, blocking the mailbox for up to the 2 s timeout. Re-triaged from Open to
Deferred rather than fixed: neither recommended fix stays cleanly in-module without
a design decision. (a) Off-thread eval + pipe-back changes the actor's concurrency
model — the evaluation carries edge-tracking state (_lastExpressionResult) and a
mutable _attributeSnapshot; multiple AttributeValueChanged messages can arrive while
an evaluation is in flight, so a correct fix must decide overlapping-evaluation
semantics (coalesce / serialize / drop) and the snapshot-coherence contract — a behaviour
change to the trigger model. (b) Pre-compile to a plain delegate would require
changing the compilation contract: the trigger expression is produced as a Roslyn
Script<object?> by ScriptCompilationService.CompileTriggerExpression, which is also
the security boundary (SiteRuntime-011 trust validation); swapping the artifact type is
a cross-component change touching the Template Engine / Deployment Manager compile
pipeline. Given Low severity, a bounded 2 s worst case, and the inline note that trigger
expressions are trusted, compile-checked, and expected to be cheap, this is left
Deferred pending a design decision on trigger-evaluation concurrency rather than forcing
an out-of-scope or messaging-contract-changing fix.
SiteRuntime-015 — LoggerFactory created per Instance Actor and never disposed
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:746 |
Description
CreateInstanceActor does var loggerFactory = new LoggerFactory(); for every
Instance Actor it creates, uses it once to produce an ILogger<InstanceActor>, and
never disposes it. LoggerFactory is IDisposable. With up to 500 instances (and
churn from redeployments) this leaks a factory per instance, and the produced
loggers are detached from the application's configured logging providers, so
Instance Actor logs may not be routed/filtered consistently with the rest of the
host.
Recommendation
Inject the application's ILoggerFactory (or an ILogger<InstanceActor> factory
delegate) into DeploymentManagerActor via DI and reuse it, rather than newing one
up per child. Do not create a fresh LoggerFactory in a hot creation path.
Resolution
Resolved 2026-05-16 (pending commit): root cause confirmed — CreateInstanceActor
did new LoggerFactory() per Instance Actor, never disposed, and detached from the
host's logging providers. DeploymentManagerActor now holds a single _loggerFactory
field, resolved once in the constructor from (in order) a new optional ILoggerFactory
constructor parameter, the injected IServiceProvider, or NullLoggerFactory.Instance
as a last resort — never a per-instance allocation. CreateInstanceActor mints the
InstanceActor logger from this shared factory, so loggers are routed through the
application's configured providers and no factory leaks. Regression test:
DeploymentManagerLoggerFactoryTests.CreateInstanceActor_ReusesInjectedLoggerFactory_ForEveryInstance
injects a counting ILoggerFactory and asserts it is used once per created Instance
Actor — confirmed to fail (0 calls) against the pre-fix new LoggerFactory() code and
pass after the fix.
SiteRuntime-016 — Short-lived execution actors, replication actor, and repositories are untested
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.SiteRuntime.Tests/ |
Description
The test project covers the coordinator actors (InstanceActor, ScriptActor,
AlarmActor, DeploymentManagerActor), persistence, scripting and streaming, but a
search of the test sources finds no references to ScriptExecutionActor,
AlarmExecutionActor, SiteReplicationActor, SiteExternalSystemRepository, or
SiteNotificationRepository. These cover critical paths: script timeout/failure
handling and result reply, alarm on-trigger execution, peer config/S&F replication
(including the SendToPeer no-peer drop), and the reflection-based repository reads.
Several findings above (001/002 mis-routing, 007 ID instability, 011 trust bypass)
would likely have been caught by targeted tests.
Recommendation
Add unit/integration tests for the execution actors (success, timeout, exception,
Ask-reply, PoisonPill self-stop), SiteReplicationActor (outbound forward, inbound
apply, peer tracking on cluster events), and the two repositories (round-trip read,
synthetic-ID lookup, missing-row behaviour).
Resolution
Resolved 2026-05-16 (pending commit): re-triaged against the current test sources —
SiteExternalSystemRepository and SiteNotificationRepository are already covered by
Repositories/SiteRepositoryTests.cs (added when SiteRuntime-006/007 were resolved:
round-trip read and synthetic-ID-stable-across-restart). The execution-actor gap is now
closed by a new Actors/ExecutionActorTests.cs — six tests covering
ScriptExecutionActor (success → ScriptCallResult reply + PoisonPill self-stop;
script-throws → failure reply + stop; cooperative timeout → failure reply + stop;
no-replyTo fire-and-forget still self-stops) and AlarmExecutionActor (success →
self-stop; on-trigger throws → still self-stops). SiteReplicationActor is not covered
here: it depends on Cluster.Get(Context.System) and so requires a clustered
ActorSystem HOCON harness that does not yet exist in this test project — adding that
harness is a larger test-infrastructure task tracked separately and out of scope for a
Low-severity coverage finding; the highest-value untested paths the finding called out
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
green.
SiteRuntime-017 — Instance Actor's live _attributes dictionary is shared by reference into child actor constructors
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:625, src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:675, src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:83, src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:93 |
Description
InstanceActor.CreateChildActors passes the Instance Actor's own mutable
_attributes field (a plain Dictionary<string, object?>) by reference into the
Props.Create(...) factory for every ScriptActor and AlarmActor (as the
initialAttributes constructor argument). Each child constructor then iterates
that dictionary to seed its _attributeSnapshot:
if (initialAttributes != null)
foreach (var kvp in initialAttributes)
_attributeSnapshot[kvp.Key] = kvp.Value;
Context.ActorOf returns immediately; the child actor's constructor runs later on
the child's mailbox thread. Meanwhile the Instance Actor's PreStart returns and
the Instance Actor begins processing its mailbox — HandleTagValueUpdate and
HandleAttributeValueChanged both mutate _attributes (_attributes[...] = ...).
A DCL tag update that arrives before a child has finished its constructor copy
therefore mutates the dictionary on the Instance Actor thread while the child
thread is enumerating it. Dictionary<,> is explicitly not safe for concurrent
read/write: the enumeration can throw InvalidOperationException ("collection was
modified") — which surfaces as an ActorInitializationException and, under the
Instance Actor's SupervisorStrategy, stops the child (the strategy returns
Stop for ActorInitializationException). The script or alarm is then silently
absent for the life of the instance. A torn read of an entry is also possible. The
window is small but deterministically reachable on a busy site at startup/failover
— exactly the staggered-startup scenario the design is most concerned about.
Recommendation
Do not share the live dictionary. Snapshot it on the Instance Actor thread before
constructing the child — e.g. pass new Dictionary<string, object?>(_attributes)
(or an immutable copy) into each Props.Create. The copy is made on the Instance
Actor thread inside CreateChildActors, so it is race-free, and each child gets a
private dictionary to seed from.
Resolution
Resolved 2026-05-17 (commit pending): root cause confirmed — CreateChildActors
captured the live _attributes field directly in every child Props.Create
closure. CreateChildActors now takes a single new Dictionary<string,object?>(_attributes)
snapshot on the Instance Actor thread and hands each ScriptActor/AlarmActor that
private copy, so no child constructor ever enumerates a dictionary the Instance
Actor is concurrently mutating. Regression test:
InstanceActorChildAttributeRaceTests.ChildActors_AreSeededFromAnIsolatedCopy_NotTheLiveAttributesDictionary
asserts every child's seed dictionary is a distinct object from the Instance
Actor's live _attributes (confirmed to fail — "seeded ... by reference" — against
the pre-fix code and pass after). ScriptActor/AlarmActor expose an internal
SeedAttributesReference for this assertion (InternalsVisibleTo added for the
test project).
SiteRuntime-018 — ScriptExecutionActor XML doc still claims a "dedicated blocking I/O dispatcher"
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:17 |
Description
The class-level XML summary on ScriptExecutionActor states "Runs on a dedicated
blocking I/O dispatcher." That is not what the code does. SiteRuntime-009 was
resolved by introducing ScriptExecutionScheduler (a bounded dedicated
TaskScheduler); the actor itself and its mailbox run on the default Akka
dispatcher, and only the script body runs on the scheduler's threads via
Task.Factory.StartNew(..., scheduler). The resolution of SiteRuntime-009
explicitly chose the TaskScheduler route instead of a HOCON dispatcher and
even removed the "in production, configure a dedicated dispatcher" comments
elsewhere — but this stale summary line was missed. A reader is told the actor is
on a dedicated dispatcher when it is not, which is misleading when reasoning about
mailbox throughput and thread-pool pressure. (AlarmExecutionActor does not carry
the equivalent claim — its summary only says "Same pattern as ScriptExecutionActor.")
Recommendation
Correct the summary to describe the actual model: the actor runs on the default
dispatcher and the script body is dispatched onto the dedicated
ScriptExecutionScheduler (SiteRuntime-009). Align the wording with the accurate
comment already present at ScriptExecutionActor.cs:71-73.
Resolution
Resolved 2026-05-17 (commit pending): root cause confirmed — the stale
"Runs on a dedicated blocking I/O dispatcher" line in the ScriptExecutionActor
class summary was missed when SiteRuntime-009 was resolved. The summary now states
the actual model: the actor and its mailbox run on the default Akka dispatcher and
only the script body is dispatched onto the dedicated ScriptExecutionScheduler
(SiteRuntime-009). Documentation-only change with no observable behaviour, so no
regression test was added; the existing suite continues to pass.
SiteRuntime-019 — Dead DisableInstanceCommand / EnableInstanceCommand handlers in InstanceActor
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:106, src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:113 |
Description
InstanceActor's constructor registers Receive<DisableInstanceCommand> and
Receive<EnableInstanceCommand> handlers that log and reply with a successful
InstanceLifecycleResponse. These handlers are unreachable. The Deployment Manager
is the only sender of those commands, and DeploymentManagerActor.HandleDisable /
HandleEnable handle the lifecycle entirely themselves — they call
Context.Stop(actor) (disable) or CreateInstanceActor(...) (enable) directly and
reply to the original sender from the Deployment Manager. Neither command is ever
Forward-ed or Tell-ed to the Instance Actor. The handlers are dead code, and
they are actively misleading: a maintainer reading InstanceActor would reasonably
believe disable/enable is partly an Instance-Actor responsibility, and the no-op
"true" reply implies an instance-side acknowledgement contract that does not exist.
If a future change did route these commands here, the disable handler would do
nothing useful (it does not stop children or tear down state — Akka does that when
the parent stops the actor).
Recommendation
Remove the two Receive<...> registrations and their handler bodies from
InstanceActor, since the Deployment Manager owns the disable/enable lifecycle.
If the intent is to keep them for a future instance-side hook, add an XML comment
stating that the Deployment Manager currently handles these and the handlers are a
reserved placeholder — but removal is preferred.
Resolution
Resolved 2026-05-17 (commit pending): re-verified as genuinely dead code — a
codebase-wide search confirms DisableInstanceCommand/EnableInstanceCommand are
only ever sent (from central) to the site DeploymentManagerActor, whose
HandleDisable/HandleEnable own the lifecycle entirely (Context.Stop /
CreateInstanceActor) and never Forward/Tell the command to the Instance
Actor. The two unreachable Receive<...> registrations and their no-op
"success" handler bodies were removed from InstanceActor, replaced with a comment
stating the Deployment Manager owns this lifecycle. Regression test:
InstanceActorTests.InstanceActor_DoesNotHandleDisableOrEnableCommands asserts the
Instance Actor produces no InstanceLifecycleResponse for either command
(confirmed to fail against the pre-fix dead handlers and pass after removal).