Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
565 lines
24 KiB
Markdown
565 lines
24 KiB
Markdown
# Code Review — SiteRuntime
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.SiteRuntime` |
|
|
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-16 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `9c60592` |
|
|
| Open findings | 16 |
|
|
|
|
## Summary
|
|
|
|
The SiteRuntime module is broadly well-structured: the actor hierarchy matches the
|
|
design doc, supervision strategies are explicit, and the trigger/alarm evaluation
|
|
logic is thorough. However the review surfaced one genuinely serious correctness
|
|
defect — `Instance.SetAttribute` never routes writes to the Data Connection Layer
|
|
for data-sourced attributes, contradicting a core design decision and silently
|
|
turning device writes into local-only static overrides. Several other findings
|
|
cluster around two themes: (1) actor-thread discipline is violated in a few hot
|
|
paths (blocking `.GetAwaiter().GetResult()` calls on the actor thread, a fragile
|
|
fixed-delay reschedule for redeployment), and (2) the site-local repositories reach
|
|
into `SiteStorageService` private state via reflection and mint entity IDs with the
|
|
non-deterministic `string.GetHashCode()`. Script execution runs on the default
|
|
thread pool rather than a dedicated blocking dispatcher (the code acknowledges this
|
|
in a comment but ships it anyway). Test coverage exists for the coordinator actors,
|
|
persistence and scripting, but the short-lived execution actors, the replication
|
|
actor, and the repositories are untested.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ✓ | SetAttribute mis-routing, deploy double-count, redeploy reschedule race. |
|
|
| 2 | Akka.NET conventions | ✓ | Blocking on actor thread, script execution not on a dedicated dispatcher, premature success reply. |
|
|
| 3 | Concurrency & thread safety | ✓ | `_attributes` dictionary shared with child actors by reference; `_executionCounter` is actor-confined (OK). |
|
|
| 4 | Error handling & resilience | ✓ | Deploy reports Success before persistence; replicated artifact/S&F failures only logged (matches best-effort design). |
|
|
| 5 | Security | ✓ | Trust-model validation is substring-based and weak; reflection used to read private fields. |
|
|
| 6 | Performance & resource management | ✓ | Per-call SQLite connections (acceptable); CPU-bound scripts not interruptible by timeout. |
|
|
| 7 | Design-document adherence | ✓ | SetAttribute DCL routing missing; staggered-startup and supervision otherwise conform. |
|
|
| 8 | Code organization & conventions | ✓ | Repositories reflect into another class; synthetic IDs non-deterministic. |
|
|
| 9 | Testing coverage | ✓ | No tests for ScriptExecutionActor, AlarmExecutionActor, SiteReplicationActor, or the two repositories. |
|
|
| 10 | Documentation & comments | ✓ | Several XML comments describe behaviour the code does not implement (see findings). |
|
|
|
|
## Findings
|
|
|
|
### SiteRuntime-001 — `Instance.SetAttribute` never writes to the Data Connection Layer
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Design-document adherence |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:204` |
|
|
|
|
**Description**
|
|
|
|
The design doc (Component-SiteRuntime.md, "GetAttribute / SetAttribute" and
|
|
"Script Runtime API") states that `Instance.SetAttribute` on a *data-connected*
|
|
attribute must send a write request to the DCL, which writes to the physical
|
|
device, and that the in-memory value is **not** optimistically updated. For *static*
|
|
attributes it updates memory and persists an override.
|
|
|
|
The implementation makes no such distinction. `ScriptRuntimeContext.SetAttribute`
|
|
unconditionally sends a `SetStaticAttributeCommand`, and `InstanceActor.HandleSetStaticAttribute`
|
|
unconditionally treats every write as a static override: it mutates `_attributes`,
|
|
publishes an `AttributeValueChanged` with hard-coded `"Good"` quality, notifies
|
|
children, and persists a SQLite override. A script writing a data-sourced attribute
|
|
therefore never reaches the device, the write failure can never be returned
|
|
synchronously to the script, and the in-memory value diverges from the device
|
|
until the next subscription update overwrites it. The persisted override is also
|
|
wrong: data-sourced attributes should not have static overrides.
|
|
|
|
**Recommendation**
|
|
|
|
In `InstanceActor`, look up the target attribute in `_configuration.Attributes`. If
|
|
it has a non-empty `DataSourceReference`, issue a DCL write (e.g. a `WriteTagRequest`
|
|
to `_dclManager`) and surface success/failure to the caller; do not persist an
|
|
override and do not optimistically mutate `_attributes`. Only attributes with no
|
|
data source reference should follow the current static-override path. Consider
|
|
splitting the message into `SetStaticAttributeCommand` vs `SetDataAttributeCommand`,
|
|
or branching inside the handler.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-002 — `RouteInboundApiSetAttributes` always treats writes as static overrides
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:632` |
|
|
|
|
**Description**
|
|
|
|
`RouteInboundApiSetAttributes` (handling `Route.To().SetAttribute(s)` from the
|
|
Inbound API) emits a `SetStaticAttributeCommand` for every attribute, so it inherits
|
|
the same defect as SiteRuntime-001: writes to data-sourced attributes never reach
|
|
the device and are instead persisted as static overrides. In addition the response
|
|
is sent back as unconditionally successful (`true`) before the Instance Actor has
|
|
even processed the command, so a non-existent attribute or a future DCL write
|
|
failure is reported to the external caller as success.
|
|
|
|
**Recommendation**
|
|
|
|
Route through the same corrected `InstanceActor` write handler as SiteRuntime-001 so
|
|
the static-vs-data distinction is honoured. The optimistic ack is acceptable for
|
|
fire-and-forget static writes per the doc, but the XML comment should make the
|
|
limitation explicit, and once data-attribute writes are supported they need a real
|
|
response path.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-003 — Redeployment relies on a fixed 500 ms reschedule and can collide on the child actor name
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:222` |
|
|
|
|
**Description**
|
|
|
|
`HandleDeploy` stops an existing Instance Actor with `Context.Stop` and then
|
|
reschedules the same `DeployInstanceCommand` to itself after a hard-coded 500 ms,
|
|
hoping the child has fully terminated by then. `Context.Stop` is asynchronous; the
|
|
child is only removed from the parent's children collection after it actually stops
|
|
(including running `PostStop` on its descendants). If a deeply nested or slow
|
|
hierarchy takes longer than 500 ms, `CreateInstanceActor` calls `Context.ActorOf`
|
|
with a name that still belongs to the terminating child and throws
|
|
`InvalidActorNameException`. The `_instanceActors` dictionary check does not prevent
|
|
this — the dictionary entry is removed immediately, but the Akka child registry is
|
|
not. The 500 ms delay is also unconditionally added to every redeploy latency.
|
|
|
|
**Recommendation**
|
|
|
|
Watch the terminating child (`Context.Watch`) and recreate the Instance Actor only
|
|
after receiving the `Terminated` message, instead of guessing with a timer. Buffer
|
|
or stash the in-flight `DeployInstanceCommand` (and any further commands for that
|
|
instance) until termination completes.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-004 — `_totalDeployedCount` is incremented on redeployment of an existing instance
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:239` |
|
|
|
|
**Description**
|
|
|
|
In `HandleDeploy`, the existing-actor branch (line 223) reschedules the command and
|
|
returns. When the rescheduled command runs, no actor exists, so the code falls
|
|
through to the "new instance" branch and executes `_totalDeployedCount++`
|
|
(line 239). A redeployment is an *update* of an already-deployed instance, not a new
|
|
one, so the deployed count is over-counted by one on every redeploy.
|
|
`StoreDeployedConfigAsync` uses UPSERT semantics, so the SQLite row count does not
|
|
grow, but the in-memory `_totalDeployedCount` (reported to the health collector via
|
|
`UpdateInstanceCounts`) drifts upward and the reported "disabled" count becomes
|
|
wrong.
|
|
|
|
**Recommendation**
|
|
|
|
Only increment `_totalDeployedCount` when the instance is genuinely new. Either
|
|
track whether this deploy replaced an existing config, or derive the deployed count
|
|
from storage / the union of running actors and disabled configs rather than
|
|
maintaining a hand-incremented counter.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-005 — Deployment reports `Success` to central before persistence completes
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:272` |
|
|
|
|
**Description**
|
|
|
|
`HandleDeploy` replies to central with `DeploymentStatus.Success` immediately after
|
|
creating the Instance Actor, while the SQLite persistence (`StoreDeployedConfigAsync`
|
|
+ `ClearStaticOverridesAsync`) runs asynchronously on a `Task.Run`. If persistence
|
|
fails, `HandleDeployPersistenceResult` only logs an error — central has already been
|
|
told the deployment succeeded. On a subsequent node restart or failover the instance
|
|
will not be re-created (it is not in SQLite), so the deployment is silently lost
|
|
despite central recording success. This contradicts the design's intent that the
|
|
site is the durable source of truth for deployed configs.
|
|
|
|
**Recommendation**
|
|
|
|
Persist the config before replying, or treat a persistence failure as a deployment
|
|
failure and send a corrective `DeploymentStatusResponse`/health signal to central.
|
|
At minimum, do not report `Success` until the config row is committed.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-006 — Site-local repositories read `SiteStorageService` private field via reflection
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Code organization & conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs:183`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs:181` |
|
|
|
|
**Description**
|
|
|
|
Both repositories' `CreateConnection()` use `Type.GetField("_connectionString",
|
|
BindingFlags.NonPublic | BindingFlags.Instance)` to extract the private connection
|
|
string out of `SiteStorageService`. This is brittle (any rename or refactor of the
|
|
field breaks it at runtime, not compile time), defeats encapsulation, and the
|
|
accompanying XML comment openly describes it as a "pragmatic" hack and is internally
|
|
contradictory (it states a connection string is "passed separately at DI
|
|
registration time" which is not what the code does). It also sits awkwardly against
|
|
the project's own script trust model, which forbids `System.Reflection` in scripts.
|
|
|
|
**Recommendation**
|
|
|
|
Expose the connection string properly: add an `ISiteStorageConnectionProvider`
|
|
(already referenced in `ServiceCollectionExtensions` XML docs but not used), or have
|
|
`SiteStorageService` expose a `CreateConnection()` factory, and inject that into the
|
|
repositories. Remove the reflection entirely.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-007 — Synthetic entity IDs use the non-deterministic `string.GetHashCode()`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs:241`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs:254` |
|
|
|
|
**Description**
|
|
|
|
`GenerateSyntheticId` computes `name.GetHashCode() & 0x7FFFFFFF`. On .NET Core,
|
|
`string.GetHashCode()` is randomized per process by default, so the "stable
|
|
deterministic synthetic ID" promised by the XML comment is not stable at all — it
|
|
changes every time the process restarts. Any caller that obtained an ID and later
|
|
calls `GetExternalSystemByIdAsync`/`GetNotificationListByIdAsync` after a restart
|
|
will fail to find the entity. It also risks collisions: distinct names can hash to
|
|
the same 31-bit value, and `GetExternalSystemByIdAsync` would then return the wrong
|
|
row.
|
|
|
|
**Recommendation**
|
|
|
|
Use a deterministic, collision-resistant hash (e.g. a stable FNV-1a or the first
|
|
bytes of a SHA-256 of the name) if a synthetic integer ID is genuinely required, or
|
|
better, change the repository contract to key these site-local artifacts by name
|
|
rather than synthesising integer IDs.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-008 — Blocking `.GetAwaiter().GetResult()` on the actor thread during startup
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:479` |
|
|
|
|
**Description**
|
|
|
|
`LoadSharedScriptsFromStorage` is called synchronously from
|
|
`HandleStartupConfigsLoaded` (the actor's message handler) and performs
|
|
`_storage.GetAllSharedScriptsAsync().GetAwaiter().GetResult()` followed by Roslyn
|
|
compilation of every shared script. This blocks the DeploymentManager singleton's
|
|
mailbox thread for the full duration of the SQLite read and all shared-script
|
|
compilation. On the default dispatcher this also ties up a thread-pool thread and
|
|
risks thread-pool starvation, and the singleton cannot process any other message
|
|
(deployments, lifecycle commands, debug routing) until it returns. The rest of the
|
|
class correctly uses `PipeTo`/`ContinueWith`.
|
|
|
|
**Recommendation**
|
|
|
|
Load shared scripts asynchronously and `PipeTo(Self)` an internal message, the same
|
|
pattern already used for `StartupConfigsLoaded`. Perform compilation either inside
|
|
the piped continuation handler (still on the actor thread but at least off the
|
|
synchronous startup path) or on a dedicated background task whose result is piped
|
|
back.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-009 — Script execution actors run scripts on the default thread pool, not a dedicated dispatcher
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:72`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:289`, `src/ScadaLink.SiteRuntime/Actors/AlarmExecutionActor.cs:57` |
|
|
|
|
**Description**
|
|
|
|
The design (CLAUDE.md "Architecture & Runtime") states Script Execution Actors run
|
|
on a *dedicated blocking I/O dispatcher*. The code does not do this: `ScriptActor.SpawnExecution`
|
|
and `AlarmActor.SpawnAlarmExecution` create the execution actors with no
|
|
`.WithDispatcher(...)`, and the execution itself runs inside a bare `Task.Run`,
|
|
i.e. on the shared .NET thread pool. The `// NOTE: In production, configure a
|
|
dedicated ... dispatcher` comments acknowledge the gap but it ships unconfigured.
|
|
Scripts can perform synchronous blocking I/O (`Database.Connection`, synchronous
|
|
`ExternalSystem.Call`); running them on the shared pool can starve it and stall
|
|
unrelated Akka dispatchers and HTTP request handling under load.
|
|
|
|
**Recommendation**
|
|
|
|
Define the dedicated dispatcher in HOCON and chain `.WithDispatcher(...)` on the
|
|
execution actor `Props`. If the `Task.Run` model is kept, run script bodies on a
|
|
dedicated `TaskScheduler` / bounded scheduler rather than the global pool. Either
|
|
way, remove the "in production, configure…" comments by actually configuring it.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-010 — `EnsureDclConnections` never updates a connection whose configuration changed
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:413` |
|
|
|
|
**Description**
|
|
|
|
`EnsureDclConnections` tracks created connections in `_createdConnections` and skips
|
|
any name already present (`if (_createdConnections.Contains(name)) continue;`). The
|
|
skip is purely name-based: if a redeployment (or an artifact deployment) changes the
|
|
endpoint, credentials, backup endpoint, or `FailoverRetryCount` of an existing
|
|
connection, the new configuration is silently ignored and the DCL keeps using the
|
|
stale `CreateConnectionCommand`. There is no `UpdateConnectionCommand` path. The
|
|
design states that after artifact deployment the site is fully self-contained with
|
|
current configuration; this caching breaks that for connection changes.
|
|
|
|
**Recommendation**
|
|
|
|
Compare the incoming connection config against the last one sent and re-issue a
|
|
create/update command when it differs, or have the DCL treat `CreateConnectionCommand`
|
|
as idempotent upsert and always forward it. Key the cache on a config hash, not just
|
|
the name.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-011 — Trust-model validation is a substring scan and is both over- and under-inclusive
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptCompilationService.cs:52` |
|
|
|
|
**Description**
|
|
|
|
`ValidateTrustModel` enforces the script trust model by doing raw `string.Contains` /
|
|
`IndexOf` on the script source text for forbidden namespace strings. This is
|
|
unreliable in both directions:
|
|
|
|
- **Bypass (under-inclusive):** the check looks only for the literal namespace
|
|
strings. A script can reach forbidden APIs without ever writing `System.IO` etc. —
|
|
e.g. via fully-qualified type use through aliasing, `global::`-prefixed names, or
|
|
simply because the namespace is already imported transitively. The compilation
|
|
references include `typeof(object).Assembly` (the whole of `System.Private.CoreLib`,
|
|
which contains `System.IO.File`, `System.Threading.Thread`, `System.Reflection`,
|
|
etc.), so forbidden types are fully resolvable at compile time and the only barrier
|
|
is this textual scan.
|
|
- **False positives (over-inclusive):** any occurrence of the substring in a comment,
|
|
string literal, or an unrelated identifier (e.g. a variable named `ProcessThreading`)
|
|
triggers a violation; the `AllowedExceptions` logic only rescues exact prefixes.
|
|
- The dead `isAllowed` variable at line 64 is computed and never used.
|
|
|
|
**Recommendation**
|
|
|
|
Enforce the trust model with a Roslyn `SyntaxWalker`/semantic analysis (inspect
|
|
resolved symbols and their containing namespaces/assemblies), or restrict the
|
|
compilation's metadata references and `AssemblyLoadContext` so forbidden types are
|
|
genuinely unavailable, rather than relying on source-text matching. Remove the
|
|
unused `isAllowed` variable.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-012 — `AttributeAccessor`/`ScopeAccessors` block the script on a synchronous Ask
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScopeAccessors.cs:28` |
|
|
|
|
**Description**
|
|
|
|
`AttributeAccessor`'s indexer getter calls `_ctx.GetAttribute(...).GetAwaiter().GetResult()`,
|
|
synchronously blocking the script-execution thread on an actor Ask. Combined with
|
|
SiteRuntime-009 (scripts run on the shared thread pool) this means a script that
|
|
reads several attributes via `Attributes["X"]` holds a pool thread blocked for each
|
|
round-trip. The async variants (`GetAsync`/`SetAsync`) exist but the ergonomic
|
|
indexer encourages the blocking path. The XML comment notes "Reads block on the
|
|
actor Ask" but does not warn about the thread-pool impact.
|
|
|
|
**Recommendation**
|
|
|
|
Once a dedicated script dispatcher exists (SiteRuntime-009) the blocking is contained
|
|
to that pool, which is acceptable; until then, document the cost clearly and prefer
|
|
steering script authors to the async accessors. Consider making the indexer
|
|
internal-only and exposing only the async API.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-013 — `HandleUnsubscribeDebugView` does nothing despite documented behaviour
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:414` |
|
|
|
|
**Description**
|
|
|
|
`HandleUnsubscribeDebugView` is documented ("Debug view unsubscribe — removes
|
|
subscription") and the actor registers a handler for `UnsubscribeDebugViewRequest`,
|
|
but the body only logs a debug message — there is no subscription state in the
|
|
Instance Actor to remove. The design places the actual subscription lifecycle in
|
|
`SiteStreamManager` (`Subscribe`/`Unsubscribe`/`RemoveSubscriber`), so the Instance
|
|
Actor genuinely has nothing to do here. The handler and its XML comment are
|
|
therefore misleading: a reader expects it to tear down a subscription.
|
|
|
|
**Recommendation**
|
|
|
|
Either remove the no-op handler and route `UnsubscribeDebugViewRequest` to wherever
|
|
the `SiteStreamManager` subscription is actually cancelled, or correct the XML
|
|
comment to state explicitly that subscription teardown is handled by
|
|
`SiteStreamManager` and this handler is a no-op acknowledgement.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-014 — Trigger-expression evaluation blocks the coordinator actor thread
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:219`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:389` |
|
|
|
|
**Description**
|
|
|
|
`EvaluateExpressionTrigger` (ScriptActor) and `EvaluateExpression` (AlarmActor) run a
|
|
compiled Roslyn script with `.RunAsync(...).GetAwaiter().GetResult()` directly inside
|
|
the actor's `AttributeValueChanged` message handler. This blocks the coordinator
|
|
actor's mailbox thread for up to the 2-second timeout on every monitored attribute
|
|
change. Coordinator actors are on the default dispatcher and process the hot path of
|
|
attribute-change fan-out; a slow expression delays all other messages to that actor
|
|
and consumes a thread-pool thread for the duration. The inline comments correctly
|
|
note CPU-bound expressions are not interruptible but do not address the
|
|
mailbox-blocking concern.
|
|
|
|
**Recommendation**
|
|
|
|
Trigger expressions are expected to be cheap, but to keep the actor responsive
|
|
consider evaluating them off the actor thread (pipe the boolean result back as an
|
|
internal message) or pre-compiling to a plain delegate that executes near-instantly
|
|
without the Roslyn scripting `RunAsync` machinery.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-015 — `LoggerFactory` created per Instance Actor and never disposed
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:746` |
|
|
|
|
**Description**
|
|
|
|
`CreateInstanceActor` does `var loggerFactory = new LoggerFactory();` for every
|
|
Instance Actor it creates, uses it once to produce an `ILogger<InstanceActor>`, and
|
|
never disposes it. `LoggerFactory` is `IDisposable`. With up to 500 instances (and
|
|
churn from redeployments) this leaks a factory per instance, and the produced
|
|
loggers are detached from the application's configured logging providers, so
|
|
Instance Actor logs may not be routed/filtered consistently with the rest of the
|
|
host.
|
|
|
|
**Recommendation**
|
|
|
|
Inject the application's `ILoggerFactory` (or an `ILogger<InstanceActor>` factory
|
|
delegate) into `DeploymentManagerActor` via DI and reuse it, rather than newing one
|
|
up per child. Do not create a fresh `LoggerFactory` in a hot creation path.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### SiteRuntime-016 — Short-lived execution actors, replication actor, and repositories are untested
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Status | Open |
|
|
| Location | `tests/ScadaLink.SiteRuntime.Tests/` |
|
|
|
|
**Description**
|
|
|
|
The test project covers the coordinator actors (`InstanceActor`, `ScriptActor`,
|
|
`AlarmActor`, `DeploymentManagerActor`), persistence, scripting and streaming, but a
|
|
search of the test sources finds no references to `ScriptExecutionActor`,
|
|
`AlarmExecutionActor`, `SiteReplicationActor`, `SiteExternalSystemRepository`, or
|
|
`SiteNotificationRepository`. These cover critical paths: script timeout/failure
|
|
handling and result reply, alarm on-trigger execution, peer config/S&F replication
|
|
(including the `SendToPeer` no-peer drop), and the reflection-based repository reads.
|
|
Several findings above (001/002 mis-routing, 007 ID instability, 011 trust bypass)
|
|
would likely have been caught by targeted tests.
|
|
|
|
**Recommendation**
|
|
|
|
Add unit/integration tests for the execution actors (success, timeout, exception,
|
|
Ask-reply, PoisonPill self-stop), `SiteReplicationActor` (outbound forward, inbound
|
|
apply, peer tracking on cluster events), and the two repositories (round-trip read,
|
|
synthetic-ID lookup, missing-row behaviour).
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|