docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
This commit is contained in:
564
code-reviews/SiteRuntime/findings.md
Normal file
564
code-reviews/SiteRuntime/findings.md
Normal file
@@ -0,0 +1,564 @@
|
||||
# Code Review — SiteRuntime
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Module | `src/ScadaLink.SiteRuntime` |
|
||||
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 16 |
|
||||
|
||||
## Summary
|
||||
|
||||
The SiteRuntime module is broadly well-structured: the actor hierarchy matches the
|
||||
design doc, supervision strategies are explicit, and the trigger/alarm evaluation
|
||||
logic is thorough. However the review surfaced one genuinely serious correctness
|
||||
defect — `Instance.SetAttribute` never routes writes to the Data Connection Layer
|
||||
for data-sourced attributes, contradicting a core design decision and silently
|
||||
turning device writes into local-only static overrides. Several other findings
|
||||
cluster around two themes: (1) actor-thread discipline is violated in a few hot
|
||||
paths (blocking `.GetAwaiter().GetResult()` calls on the actor thread, a fragile
|
||||
fixed-delay reschedule for redeployment), and (2) the site-local repositories reach
|
||||
into `SiteStorageService` private state via reflection and mint entity IDs with the
|
||||
non-deterministic `string.GetHashCode()`. Script execution runs on the default
|
||||
thread pool rather than a dedicated blocking dispatcher (the code acknowledges this
|
||||
in a comment but ships it anyway). Test coverage exists for the coordinator actors,
|
||||
persistence and scripting, but the short-lived execution actors, the replication
|
||||
actor, and the repositories are untested.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | SetAttribute mis-routing, deploy double-count, redeploy reschedule race. |
|
||||
| 2 | Akka.NET conventions | ✓ | Blocking on actor thread, script execution not on a dedicated dispatcher, premature success reply. |
|
||||
| 3 | Concurrency & thread safety | ✓ | `_attributes` dictionary shared with child actors by reference; `_executionCounter` is actor-confined (OK). |
|
||||
| 4 | Error handling & resilience | ✓ | Deploy reports Success before persistence; replicated artifact/S&F failures only logged (matches best-effort design). |
|
||||
| 5 | Security | ✓ | Trust-model validation is substring-based and weak; reflection used to read private fields. |
|
||||
| 6 | Performance & resource management | ✓ | Per-call SQLite connections (acceptable); CPU-bound scripts not interruptible by timeout. |
|
||||
| 7 | Design-document adherence | ✓ | SetAttribute DCL routing missing; staggered-startup and supervision otherwise conform. |
|
||||
| 8 | Code organization & conventions | ✓ | Repositories reflect into another class; synthetic IDs non-deterministic. |
|
||||
| 9 | Testing coverage | ✓ | No tests for ScriptExecutionActor, AlarmExecutionActor, SiteReplicationActor, or the two repositories. |
|
||||
| 10 | Documentation & comments | ✓ | Several XML comments describe behaviour the code does not implement (see findings). |
|
||||
|
||||
## Findings
|
||||
|
||||
### SiteRuntime-001 — `Instance.SetAttribute` never writes to the Data Connection Layer
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:204` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design doc (Component-SiteRuntime.md, "GetAttribute / SetAttribute" and
|
||||
"Script Runtime API") states that `Instance.SetAttribute` on a *data-connected*
|
||||
attribute must send a write request to the DCL, which writes to the physical
|
||||
device, and that the in-memory value is **not** optimistically updated. For *static*
|
||||
attributes it updates memory and persists an override.
|
||||
|
||||
The implementation makes no such distinction. `ScriptRuntimeContext.SetAttribute`
|
||||
unconditionally sends a `SetStaticAttributeCommand`, and `InstanceActor.HandleSetStaticAttribute`
|
||||
unconditionally treats every write as a static override: it mutates `_attributes`,
|
||||
publishes an `AttributeValueChanged` with hard-coded `"Good"` quality, notifies
|
||||
children, and persists a SQLite override. A script writing a data-sourced attribute
|
||||
therefore never reaches the device, the write failure can never be returned
|
||||
synchronously to the script, and the in-memory value diverges from the device
|
||||
until the next subscription update overwrites it. The persisted override is also
|
||||
wrong: data-sourced attributes should not have static overrides.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
In `InstanceActor`, look up the target attribute in `_configuration.Attributes`. If
|
||||
it has a non-empty `DataSourceReference`, issue a DCL write (e.g. a `WriteTagRequest`
|
||||
to `_dclManager`) and surface success/failure to the caller; do not persist an
|
||||
override and do not optimistically mutate `_attributes`. Only attributes with no
|
||||
data source reference should follow the current static-override path. Consider
|
||||
splitting the message into `SetStaticAttributeCommand` vs `SetDataAttributeCommand`,
|
||||
or branching inside the handler.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-002 — `RouteInboundApiSetAttributes` always treats writes as static overrides
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:632` |
|
||||
|
||||
**Description**
|
||||
|
||||
`RouteInboundApiSetAttributes` (handling `Route.To().SetAttribute(s)` from the
|
||||
Inbound API) emits a `SetStaticAttributeCommand` for every attribute, so it inherits
|
||||
the same defect as SiteRuntime-001: writes to data-sourced attributes never reach
|
||||
the device and are instead persisted as static overrides. In addition the response
|
||||
is sent back as unconditionally successful (`true`) before the Instance Actor has
|
||||
even processed the command, so a non-existent attribute or a future DCL write
|
||||
failure is reported to the external caller as success.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Route through the same corrected `InstanceActor` write handler as SiteRuntime-001 so
|
||||
the static-vs-data distinction is honoured. The optimistic ack is acceptable for
|
||||
fire-and-forget static writes per the doc, but the XML comment should make the
|
||||
limitation explicit, and once data-attribute writes are supported they need a real
|
||||
response path.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-003 — Redeployment relies on a fixed 500 ms reschedule and can collide on the child actor name
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:222` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleDeploy` stops an existing Instance Actor with `Context.Stop` and then
|
||||
reschedules the same `DeployInstanceCommand` to itself after a hard-coded 500 ms,
|
||||
hoping the child has fully terminated by then. `Context.Stop` is asynchronous; the
|
||||
child is only removed from the parent's children collection after it actually stops
|
||||
(including running `PostStop` on its descendants). If a deeply nested or slow
|
||||
hierarchy takes longer than 500 ms, `CreateInstanceActor` calls `Context.ActorOf`
|
||||
with a name that still belongs to the terminating child and throws
|
||||
`InvalidActorNameException`. The `_instanceActors` dictionary check does not prevent
|
||||
this — the dictionary entry is removed immediately, but the Akka child registry is
|
||||
not. The 500 ms delay is also unconditionally added to every redeploy latency.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Watch the terminating child (`Context.Watch`) and recreate the Instance Actor only
|
||||
after receiving the `Terminated` message, instead of guessing with a timer. Buffer
|
||||
or stash the in-flight `DeployInstanceCommand` (and any further commands for that
|
||||
instance) until termination completes.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-004 — `_totalDeployedCount` is incremented on redeployment of an existing instance
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:239` |
|
||||
|
||||
**Description**
|
||||
|
||||
In `HandleDeploy`, the existing-actor branch (line 223) reschedules the command and
|
||||
returns. When the rescheduled command runs, no actor exists, so the code falls
|
||||
through to the "new instance" branch and executes `_totalDeployedCount++`
|
||||
(line 239). A redeployment is an *update* of an already-deployed instance, not a new
|
||||
one, so the deployed count is over-counted by one on every redeploy.
|
||||
`StoreDeployedConfigAsync` uses UPSERT semantics, so the SQLite row count does not
|
||||
grow, but the in-memory `_totalDeployedCount` (reported to the health collector via
|
||||
`UpdateInstanceCounts`) drifts upward and the reported "disabled" count becomes
|
||||
wrong.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Only increment `_totalDeployedCount` when the instance is genuinely new. Either
|
||||
track whether this deploy replaced an existing config, or derive the deployed count
|
||||
from storage / the union of running actors and disabled configs rather than
|
||||
maintaining a hand-incremented counter.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-005 — Deployment reports `Success` to central before persistence completes
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:272` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleDeploy` replies to central with `DeploymentStatus.Success` immediately after
|
||||
creating the Instance Actor, while the SQLite persistence (`StoreDeployedConfigAsync`
|
||||
+ `ClearStaticOverridesAsync`) runs asynchronously on a `Task.Run`. If persistence
|
||||
fails, `HandleDeployPersistenceResult` only logs an error — central has already been
|
||||
told the deployment succeeded. On a subsequent node restart or failover the instance
|
||||
will not be re-created (it is not in SQLite), so the deployment is silently lost
|
||||
despite central recording success. This contradicts the design's intent that the
|
||||
site is the durable source of truth for deployed configs.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Persist the config before replying, or treat a persistence failure as a deployment
|
||||
failure and send a corrective `DeploymentStatusResponse`/health signal to central.
|
||||
At minimum, do not report `Success` until the config row is committed.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-006 — Site-local repositories read `SiteStorageService` private field via reflection
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs:183`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs:181` |
|
||||
|
||||
**Description**
|
||||
|
||||
Both repositories' `CreateConnection()` use `Type.GetField("_connectionString",
|
||||
BindingFlags.NonPublic | BindingFlags.Instance)` to extract the private connection
|
||||
string out of `SiteStorageService`. This is brittle (any rename or refactor of the
|
||||
field breaks it at runtime, not compile time), defeats encapsulation, and the
|
||||
accompanying XML comment openly describes it as a "pragmatic" hack and is internally
|
||||
contradictory (it states a connection string is "passed separately at DI
|
||||
registration time" which is not what the code does). It also sits awkwardly against
|
||||
the project's own script trust model, which forbids `System.Reflection` in scripts.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Expose the connection string properly: add an `ISiteStorageConnectionProvider`
|
||||
(already referenced in `ServiceCollectionExtensions` XML docs but not used), or have
|
||||
`SiteStorageService` expose a `CreateConnection()` factory, and inject that into the
|
||||
repositories. Remove the reflection entirely.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-007 — Synthetic entity IDs use the non-deterministic `string.GetHashCode()`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Repositories/SiteExternalSystemRepository.cs:241`, `src/ScadaLink.SiteRuntime/Repositories/SiteNotificationRepository.cs:254` |
|
||||
|
||||
**Description**
|
||||
|
||||
`GenerateSyntheticId` computes `name.GetHashCode() & 0x7FFFFFFF`. On .NET Core,
|
||||
`string.GetHashCode()` is randomized per process by default, so the "stable
|
||||
deterministic synthetic ID" promised by the XML comment is not stable at all — it
|
||||
changes every time the process restarts. Any caller that obtained an ID and later
|
||||
calls `GetExternalSystemByIdAsync`/`GetNotificationListByIdAsync` after a restart
|
||||
will fail to find the entity. It also risks collisions: distinct names can hash to
|
||||
the same 31-bit value, and `GetExternalSystemByIdAsync` would then return the wrong
|
||||
row.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Use a deterministic, collision-resistant hash (e.g. a stable FNV-1a or the first
|
||||
bytes of a SHA-256 of the name) if a synthetic integer ID is genuinely required, or
|
||||
better, change the repository contract to key these site-local artifacts by name
|
||||
rather than synthesising integer IDs.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-008 — Blocking `.GetAwaiter().GetResult()` on the actor thread during startup
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:479` |
|
||||
|
||||
**Description**
|
||||
|
||||
`LoadSharedScriptsFromStorage` is called synchronously from
|
||||
`HandleStartupConfigsLoaded` (the actor's message handler) and performs
|
||||
`_storage.GetAllSharedScriptsAsync().GetAwaiter().GetResult()` followed by Roslyn
|
||||
compilation of every shared script. This blocks the DeploymentManager singleton's
|
||||
mailbox thread for the full duration of the SQLite read and all shared-script
|
||||
compilation. On the default dispatcher this also ties up a thread-pool thread and
|
||||
risks thread-pool starvation, and the singleton cannot process any other message
|
||||
(deployments, lifecycle commands, debug routing) until it returns. The rest of the
|
||||
class correctly uses `PipeTo`/`ContinueWith`.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Load shared scripts asynchronously and `PipeTo(Self)` an internal message, the same
|
||||
pattern already used for `StartupConfigsLoaded`. Perform compilation either inside
|
||||
the piped continuation handler (still on the actor thread but at least off the
|
||||
synchronous startup path) or on a dedicated background task whose result is piped
|
||||
back.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-009 — Script execution actors run scripts on the default thread pool, not a dedicated dispatcher
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:72`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:289`, `src/ScadaLink.SiteRuntime/Actors/AlarmExecutionActor.cs:57` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design (CLAUDE.md "Architecture & Runtime") states Script Execution Actors run
|
||||
on a *dedicated blocking I/O dispatcher*. The code does not do this: `ScriptActor.SpawnExecution`
|
||||
and `AlarmActor.SpawnAlarmExecution` create the execution actors with no
|
||||
`.WithDispatcher(...)`, and the execution itself runs inside a bare `Task.Run`,
|
||||
i.e. on the shared .NET thread pool. The `// NOTE: In production, configure a
|
||||
dedicated ... dispatcher` comments acknowledge the gap but it ships unconfigured.
|
||||
Scripts can perform synchronous blocking I/O (`Database.Connection`, synchronous
|
||||
`ExternalSystem.Call`); running them on the shared pool can starve it and stall
|
||||
unrelated Akka dispatchers and HTTP request handling under load.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Define the dedicated dispatcher in HOCON and chain `.WithDispatcher(...)` on the
|
||||
execution actor `Props`. If the `Task.Run` model is kept, run script bodies on a
|
||||
dedicated `TaskScheduler` / bounded scheduler rather than the global pool. Either
|
||||
way, remove the "in production, configure…" comments by actually configuring it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-010 — `EnsureDclConnections` never updates a connection whose configuration changed
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:413` |
|
||||
|
||||
**Description**
|
||||
|
||||
`EnsureDclConnections` tracks created connections in `_createdConnections` and skips
|
||||
any name already present (`if (_createdConnections.Contains(name)) continue;`). The
|
||||
skip is purely name-based: if a redeployment (or an artifact deployment) changes the
|
||||
endpoint, credentials, backup endpoint, or `FailoverRetryCount` of an existing
|
||||
connection, the new configuration is silently ignored and the DCL keeps using the
|
||||
stale `CreateConnectionCommand`. There is no `UpdateConnectionCommand` path. The
|
||||
design states that after artifact deployment the site is fully self-contained with
|
||||
current configuration; this caching breaks that for connection changes.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Compare the incoming connection config against the last one sent and re-issue a
|
||||
create/update command when it differs, or have the DCL treat `CreateConnectionCommand`
|
||||
as idempotent upsert and always forward it. Key the cache on a config hash, not just
|
||||
the name.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-011 — Trust-model validation is a substring scan and is both over- and under-inclusive
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScriptCompilationService.cs:52` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ValidateTrustModel` enforces the script trust model by doing raw `string.Contains` /
|
||||
`IndexOf` on the script source text for forbidden namespace strings. This is
|
||||
unreliable in both directions:
|
||||
|
||||
- **Bypass (under-inclusive):** the check looks only for the literal namespace
|
||||
strings. A script can reach forbidden APIs without ever writing `System.IO` etc. —
|
||||
e.g. via fully-qualified type use through aliasing, `global::`-prefixed names, or
|
||||
simply because the namespace is already imported transitively. The compilation
|
||||
references include `typeof(object).Assembly` (the whole of `System.Private.CoreLib`,
|
||||
which contains `System.IO.File`, `System.Threading.Thread`, `System.Reflection`,
|
||||
etc.), so forbidden types are fully resolvable at compile time and the only barrier
|
||||
is this textual scan.
|
||||
- **False positives (over-inclusive):** any occurrence of the substring in a comment,
|
||||
string literal, or an unrelated identifier (e.g. a variable named `ProcessThreading`)
|
||||
triggers a violation; the `AllowedExceptions` logic only rescues exact prefixes.
|
||||
- The dead `isAllowed` variable at line 64 is computed and never used.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Enforce the trust model with a Roslyn `SyntaxWalker`/semantic analysis (inspect
|
||||
resolved symbols and their containing namespaces/assemblies), or restrict the
|
||||
compilation's metadata references and `AssemblyLoadContext` so forbidden types are
|
||||
genuinely unavailable, rather than relying on source-text matching. Remove the
|
||||
unused `isAllowed` variable.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-012 — `AttributeAccessor`/`ScopeAccessors` block the script on a synchronous Ask
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScopeAccessors.cs:28` |
|
||||
|
||||
**Description**
|
||||
|
||||
`AttributeAccessor`'s indexer getter calls `_ctx.GetAttribute(...).GetAwaiter().GetResult()`,
|
||||
synchronously blocking the script-execution thread on an actor Ask. Combined with
|
||||
SiteRuntime-009 (scripts run on the shared thread pool) this means a script that
|
||||
reads several attributes via `Attributes["X"]` holds a pool thread blocked for each
|
||||
round-trip. The async variants (`GetAsync`/`SetAsync`) exist but the ergonomic
|
||||
indexer encourages the blocking path. The XML comment notes "Reads block on the
|
||||
actor Ask" but does not warn about the thread-pool impact.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Once a dedicated script dispatcher exists (SiteRuntime-009) the blocking is contained
|
||||
to that pool, which is acceptable; until then, document the cost clearly and prefer
|
||||
steering script authors to the async accessors. Consider making the indexer
|
||||
internal-only and exposing only the async API.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-013 — `HandleUnsubscribeDebugView` does nothing despite documented behaviour
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:414` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleUnsubscribeDebugView` is documented ("Debug view unsubscribe — removes
|
||||
subscription") and the actor registers a handler for `UnsubscribeDebugViewRequest`,
|
||||
but the body only logs a debug message — there is no subscription state in the
|
||||
Instance Actor to remove. The design places the actual subscription lifecycle in
|
||||
`SiteStreamManager` (`Subscribe`/`Unsubscribe`/`RemoveSubscriber`), so the Instance
|
||||
Actor genuinely has nothing to do here. The handler and its XML comment are
|
||||
therefore misleading: a reader expects it to tear down a subscription.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either remove the no-op handler and route `UnsubscribeDebugViewRequest` to wherever
|
||||
the `SiteStreamManager` subscription is actually cancelled, or correct the XML
|
||||
comment to state explicitly that subscription teardown is handled by
|
||||
`SiteStreamManager` and this handler is a no-op acknowledgement.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-014 — Trigger-expression evaluation blocks the coordinator actor thread
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Akka.NET conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:219`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:389` |
|
||||
|
||||
**Description**
|
||||
|
||||
`EvaluateExpressionTrigger` (ScriptActor) and `EvaluateExpression` (AlarmActor) run a
|
||||
compiled Roslyn script with `.RunAsync(...).GetAwaiter().GetResult()` directly inside
|
||||
the actor's `AttributeValueChanged` message handler. This blocks the coordinator
|
||||
actor's mailbox thread for up to the 2-second timeout on every monitored attribute
|
||||
change. Coordinator actors are on the default dispatcher and process the hot path of
|
||||
attribute-change fan-out; a slow expression delays all other messages to that actor
|
||||
and consumes a thread-pool thread for the duration. The inline comments correctly
|
||||
note CPU-bound expressions are not interruptible but do not address the
|
||||
mailbox-blocking concern.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Trigger expressions are expected to be cheap, but to keep the actor responsive
|
||||
consider evaluating them off the actor thread (pipe the boolean result back as an
|
||||
internal message) or pre-compiling to a plain delegate that executes near-instantly
|
||||
without the Roslyn scripting `RunAsync` machinery.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-015 — `LoggerFactory` created per Instance Actor and never disposed
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:746` |
|
||||
|
||||
**Description**
|
||||
|
||||
`CreateInstanceActor` does `var loggerFactory = new LoggerFactory();` for every
|
||||
Instance Actor it creates, uses it once to produce an `ILogger<InstanceActor>`, and
|
||||
never disposes it. `LoggerFactory` is `IDisposable`. With up to 500 instances (and
|
||||
churn from redeployments) this leaks a factory per instance, and the produced
|
||||
loggers are detached from the application's configured logging providers, so
|
||||
Instance Actor logs may not be routed/filtered consistently with the rest of the
|
||||
host.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Inject the application's `ILoggerFactory` (or an `ILogger<InstanceActor>` factory
|
||||
delegate) into `DeploymentManagerActor` via DI and reuse it, rather than newing one
|
||||
up per child. Do not create a fresh `LoggerFactory` in a hot creation path.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-016 — Short-lived execution actors, replication actor, and repositories are untested
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Location | `tests/ScadaLink.SiteRuntime.Tests/` |
|
||||
|
||||
**Description**
|
||||
|
||||
The test project covers the coordinator actors (`InstanceActor`, `ScriptActor`,
|
||||
`AlarmActor`, `DeploymentManagerActor`), persistence, scripting and streaming, but a
|
||||
search of the test sources finds no references to `ScriptExecutionActor`,
|
||||
`AlarmExecutionActor`, `SiteReplicationActor`, `SiteExternalSystemRepository`, or
|
||||
`SiteNotificationRepository`. These cover critical paths: script timeout/failure
|
||||
handling and result reply, alarm on-trigger execution, peer config/S&F replication
|
||||
(including the `SendToPeer` no-peer drop), and the reflection-based repository reads.
|
||||
Several findings above (001/002 mis-routing, 007 ID instability, 011 trust bypass)
|
||||
would likely have been caught by targeted tests.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add unit/integration tests for the execution actors (success, timeout, exception,
|
||||
Ask-reply, PoisonPill self-stop), `SiteReplicationActor` (outbound forward, inbound
|
||||
apply, peer tracking on cluster events), and the two repositories (round-trip read,
|
||||
synthetic-ID lookup, missing-row behaviour).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Reference in New Issue
Block a user