docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine
11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.SiteRuntime` |
|
||||
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 3 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -28,6 +28,24 @@ in a comment but ships it anyway). Test coverage exists for the coordinator acto
|
||||
persistence and scripting, but the short-lived execution actors, the replication
|
||||
actor, and the repositories are untested.
|
||||
|
||||
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||
|
||||
The module was re-reviewed at commit `39d737e`. No source under
|
||||
`src/ScadaLink.SiteRuntime` has changed since the previous review at `9c60592`
|
||||
(the only intervening commits are code-review documentation updates), so all of
|
||||
SiteRuntime-001..013, 015, 016 remain Resolved and SiteRuntime-014 remains
|
||||
Deferred — its Deferred justification (a trigger-evaluation concurrency design
|
||||
decision is required before either recommended fix can land in-module) still
|
||||
holds verbatim against the unchanged `ScriptActor`/`AlarmActor` source. The
|
||||
re-review nonetheless worked through all 10 checklist categories afresh and
|
||||
surfaced three new findings that the prior pass did not record: a cross-thread
|
||||
`Dictionary` enumeration race when the Instance Actor's live `_attributes`
|
||||
dictionary is handed by reference into child `ScriptActor`/`AlarmActor`
|
||||
constructors (SiteRuntime-017, Medium); a stale `ScriptExecutionActor` XML doc
|
||||
that still claims a "dedicated blocking I/O dispatcher" (SiteRuntime-018, Low);
|
||||
and two dead lifecycle handlers in `InstanceActor` that the Deployment Manager
|
||||
never routes to (SiteRuntime-019, Low). Open findings: 3.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
@@ -733,3 +751,126 @@ harness is a larger test-infrastructure task tracked separately and out of scope
|
||||
Low-severity coverage finding; the highest-value untested paths the finding called out
|
||||
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
|
||||
green.
|
||||
|
||||
### SiteRuntime-017 — Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:625`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:675`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:83`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:93` |
|
||||
|
||||
**Description**
|
||||
|
||||
`InstanceActor.CreateChildActors` passes the Instance Actor's own mutable
|
||||
`_attributes` field (a plain `Dictionary<string, object?>`) by reference into the
|
||||
`Props.Create(...)` factory for every `ScriptActor` and `AlarmActor` (as the
|
||||
`initialAttributes` constructor argument). Each child constructor then iterates
|
||||
that dictionary to seed its `_attributeSnapshot`:
|
||||
|
||||
```csharp
|
||||
if (initialAttributes != null)
|
||||
foreach (var kvp in initialAttributes)
|
||||
_attributeSnapshot[kvp.Key] = kvp.Value;
|
||||
```
|
||||
|
||||
`Context.ActorOf` returns immediately; the child actor's constructor runs later on
|
||||
the *child's* mailbox thread. Meanwhile the Instance Actor's `PreStart` returns and
|
||||
the Instance Actor begins processing its mailbox — `HandleTagValueUpdate` and
|
||||
`HandleAttributeValueChanged` both mutate `_attributes` (`_attributes[...] = ...`).
|
||||
A DCL tag update that arrives before a child has finished its constructor copy
|
||||
therefore mutates the dictionary on the Instance Actor thread while the child
|
||||
thread is enumerating it. `Dictionary<,>` is explicitly not safe for concurrent
|
||||
read/write: the enumeration can throw `InvalidOperationException` ("collection was
|
||||
modified") — which surfaces as an `ActorInitializationException` and, under the
|
||||
Instance Actor's `SupervisorStrategy`, **stops** the child (the strategy returns
|
||||
`Stop` for `ActorInitializationException`). The script or alarm is then silently
|
||||
absent for the life of the instance. A torn read of an entry is also possible. The
|
||||
window is small but deterministically reachable on a busy site at startup/failover
|
||||
— exactly the staggered-startup scenario the design is most concerned about.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Do not share the live dictionary. Snapshot it on the Instance Actor thread before
|
||||
constructing the child — e.g. pass `new Dictionary<string, object?>(_attributes)`
|
||||
(or an immutable copy) into each `Props.Create`. The copy is made on the Instance
|
||||
Actor thread inside `CreateChildActors`, so it is race-free, and each child gets a
|
||||
private dictionary to seed from.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-018 — `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher"
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:17` |
|
||||
|
||||
**Description**
|
||||
|
||||
The class-level XML summary on `ScriptExecutionActor` states "Runs on a dedicated
|
||||
blocking I/O dispatcher." That is not what the code does. SiteRuntime-009 was
|
||||
resolved by introducing `ScriptExecutionScheduler` (a bounded dedicated
|
||||
`TaskScheduler`); the *actor itself and its mailbox* run on the **default** Akka
|
||||
dispatcher, and only the script body runs on the scheduler's threads via
|
||||
`Task.Factory.StartNew(..., scheduler)`. The resolution of SiteRuntime-009
|
||||
explicitly chose the `TaskScheduler` route *instead of* a HOCON dispatcher and
|
||||
even removed the "in production, configure a dedicated dispatcher" comments
|
||||
elsewhere — but this stale summary line was missed. A reader is told the actor is
|
||||
on a dedicated dispatcher when it is not, which is misleading when reasoning about
|
||||
mailbox throughput and thread-pool pressure. (`AlarmExecutionActor` does not carry
|
||||
the equivalent claim — its summary only says "Same pattern as ScriptExecutionActor.")
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Correct the summary to describe the actual model: the actor runs on the default
|
||||
dispatcher and the script body is dispatched onto the dedicated
|
||||
`ScriptExecutionScheduler` (SiteRuntime-009). Align the wording with the accurate
|
||||
comment already present at `ScriptExecutionActor.cs:71-73`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### SiteRuntime-019 — Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor`
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:113` |
|
||||
|
||||
**Description**
|
||||
|
||||
`InstanceActor`'s constructor registers `Receive<DisableInstanceCommand>` and
|
||||
`Receive<EnableInstanceCommand>` handlers that log and reply with a successful
|
||||
`InstanceLifecycleResponse`. These handlers are unreachable. The Deployment Manager
|
||||
is the only sender of those commands, and `DeploymentManagerActor.HandleDisable` /
|
||||
`HandleEnable` handle the lifecycle entirely themselves — they call
|
||||
`Context.Stop(actor)` (disable) or `CreateInstanceActor(...)` (enable) directly and
|
||||
reply to the original sender from the Deployment Manager. Neither command is ever
|
||||
`Forward`-ed or `Tell`-ed to the Instance Actor. The handlers are dead code, and
|
||||
they are actively misleading: a maintainer reading `InstanceActor` would reasonably
|
||||
believe disable/enable is partly an Instance-Actor responsibility, and the no-op
|
||||
"true" reply implies an instance-side acknowledgement contract that does not exist.
|
||||
If a future change *did* route these commands here, the disable handler would do
|
||||
nothing useful (it does not stop children or tear down state — Akka does that when
|
||||
the parent stops the actor).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Remove the two `Receive<...>` registrations and their handler bodies from
|
||||
`InstanceActor`, since the Deployment Manager owns the disable/enable lifecycle.
|
||||
If the intent is to keep them for a future instance-side hook, add an XML comment
|
||||
stating that the Deployment Manager currently handles these and the handlers are a
|
||||
reserved placeholder — but removal is preferred.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user