docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine

11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
This commit is contained in:
Joseph Doherty
2026-05-17 00:51:58 -04:00
parent 3b3760f026
commit 0ba4e49e11
5 changed files with 613 additions and 27 deletions

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.SiteRuntime` |
| Design doc | `docs/requirements/Component-SiteRuntime.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 0 |
| Commit reviewed | `39d737e` |
| Open findings | 3 |
## Summary
@@ -28,6 +28,24 @@ in a comment but ships it anyway). Test coverage exists for the coordinator acto
persistence and scripting, but the short-lived execution actors, the replication
actor, and the repositories are untested.
#### Re-review 2026-05-17 (commit `39d737e`)
The module was re-reviewed at commit `39d737e`. No source under
`src/ScadaLink.SiteRuntime` has changed since the previous review at `9c60592`
(the only intervening commits are code-review documentation updates), so all of
SiteRuntime-001..013, 015, 016 remain Resolved and SiteRuntime-014 remains
Deferred — its Deferred justification (a trigger-evaluation concurrency design
decision is required before either recommended fix can land in-module) still
holds verbatim against the unchanged `ScriptActor`/`AlarmActor` source. The
re-review nonetheless worked through all 10 checklist categories afresh and
surfaced three new findings that the prior pass did not record: a cross-thread
`Dictionary` enumeration race when the Instance Actor's live `_attributes`
dictionary is handed by reference into child `ScriptActor`/`AlarmActor`
constructors (SiteRuntime-017, Medium); a stale `ScriptExecutionActor` XML doc
that still claims a "dedicated blocking I/O dispatcher" (SiteRuntime-018, Low);
and two dead lifecycle handlers in `InstanceActor` that the Deployment Manager
never routes to (SiteRuntime-019, Low). Open findings: 3.
## Checklist coverage
| # | Category | Examined | Notes |
@@ -733,3 +751,126 @@ harness is a larger test-infrastructure task tracked separately and out of scope
Low-severity coverage finding; the highest-value untested paths the finding called out
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
green.
### SiteRuntime-017 — Instance Actor's live `_attributes` dictionary is shared by reference into child actor constructors
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:625`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:675`, `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:83`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:93` |
**Description**
`InstanceActor.CreateChildActors` passes the Instance Actor's own mutable
`_attributes` field (a plain `Dictionary<string, object?>`) by reference into the
`Props.Create(...)` factory for every `ScriptActor` and `AlarmActor` (as the
`initialAttributes` constructor argument). Each child constructor then iterates
that dictionary to seed its `_attributeSnapshot`:
```csharp
if (initialAttributes != null)
foreach (var kvp in initialAttributes)
_attributeSnapshot[kvp.Key] = kvp.Value;
```
`Context.ActorOf` returns immediately; the child actor's constructor runs later on
the *child's* mailbox thread. Meanwhile the Instance Actor's `PreStart` returns and
the Instance Actor begins processing its mailbox — `HandleTagValueUpdate` and
`HandleAttributeValueChanged` both mutate `_attributes` (`_attributes[...] = ...`).
A DCL tag update that arrives before a child has finished its constructor copy
therefore mutates the dictionary on the Instance Actor thread while the child
thread is enumerating it. `Dictionary<,>` is explicitly not safe for concurrent
read/write: the enumeration can throw `InvalidOperationException` ("collection was
modified") — which surfaces as an `ActorInitializationException` and, under the
Instance Actor's `SupervisorStrategy`, **stops** the child (the strategy returns
`Stop` for `ActorInitializationException`). The script or alarm is then silently
absent for the life of the instance. A torn read of an entry is also possible. The
window is small but deterministically reachable on a busy site at startup/failover
— exactly the staggered-startup scenario the design is most concerned about.
**Recommendation**
Do not share the live dictionary. Snapshot it on the Instance Actor thread before
constructing the child — e.g. pass `new Dictionary<string, object?>(_attributes)`
(or an immutable copy) into each `Props.Create`. The copy is made on the Instance
Actor thread inside `CreateChildActors`, so it is race-free, and each child gets a
private dictionary to seed from.
**Resolution**
_Unresolved._
### SiteRuntime-018 — `ScriptExecutionActor` XML doc still claims a "dedicated blocking I/O dispatcher"
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs:17` |
**Description**
The class-level XML summary on `ScriptExecutionActor` states "Runs on a dedicated
blocking I/O dispatcher." That is not what the code does. SiteRuntime-009 was
resolved by introducing `ScriptExecutionScheduler` (a bounded dedicated
`TaskScheduler`); the *actor itself and its mailbox* run on the **default** Akka
dispatcher, and only the script body runs on the scheduler's threads via
`Task.Factory.StartNew(..., scheduler)`. The resolution of SiteRuntime-009
explicitly chose the `TaskScheduler` route *instead of* a HOCON dispatcher and
even removed the "in production, configure a dedicated dispatcher" comments
elsewhere — but this stale summary line was missed. A reader is told the actor is
on a dedicated dispatcher when it is not, which is misleading when reasoning about
mailbox throughput and thread-pool pressure. (`AlarmExecutionActor` does not carry
the equivalent claim — its summary only says "Same pattern as ScriptExecutionActor.")
**Recommendation**
Correct the summary to describe the actual model: the actor runs on the default
dispatcher and the script body is dispatched onto the dedicated
`ScriptExecutionScheduler` (SiteRuntime-009). Align the wording with the accurate
comment already present at `ScriptExecutionActor.cs:71-73`.
**Resolution**
_Unresolved._
### SiteRuntime-019 — Dead `DisableInstanceCommand` / `EnableInstanceCommand` handlers in `InstanceActor`
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:106`, `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:113` |
**Description**
`InstanceActor`'s constructor registers `Receive<DisableInstanceCommand>` and
`Receive<EnableInstanceCommand>` handlers that log and reply with a successful
`InstanceLifecycleResponse`. These handlers are unreachable. The Deployment Manager
is the only sender of those commands, and `DeploymentManagerActor.HandleDisable` /
`HandleEnable` handle the lifecycle entirely themselves — they call
`Context.Stop(actor)` (disable) or `CreateInstanceActor(...)` (enable) directly and
reply to the original sender from the Deployment Manager. Neither command is ever
`Forward`-ed or `Tell`-ed to the Instance Actor. The handlers are dead code, and
they are actively misleading: a maintainer reading `InstanceActor` would reasonably
believe disable/enable is partly an Instance-Actor responsibility, and the no-op
"true" reply implies an instance-side acknowledgement contract that does not exist.
If a future change *did* route these commands here, the disable handler would do
nothing useful (it does not stop children or tear down state — Akka does that when
the parent stops the actor).
**Recommendation**
Remove the two `Receive<...>` registrations and their handler bodies from
`InstanceActor`, since the Deployment Manager owns the disable/enable lifecycle.
If the intent is to keep them for a future instance-side hook, add an XML comment
stating that the Deployment Manager currently handles these and the handlers are a
reserved placeholder — but removal is preferred.
**Resolution**
_Unresolved._