fix(site-runtime): resolve SiteRuntime-012,013,015,016 — doc accuracy, shared LoggerFactory, execution-actor coverage; SiteRuntime-014 deferred

This commit is contained in:
Joseph Doherty
2026-05-16 22:32:30 -04:00
parent b1ea78a9fd
commit dd7626da63
6 changed files with 404 additions and 18 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 5 |
| Open findings | 0 |
## Summary
@@ -521,7 +521,7 @@ literal/identifier non-detection, allowed-exception resolution); all 39 existing
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Scripts/ScopeAccessors.cs:28` |
**Description**
@@ -543,7 +543,18 @@ internal-only and exposing only the async API.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (`pending commit`): re-triaged against the current source — the
finding's own recommendation states the blocking is *acceptable* once SiteRuntime-009's
dedicated script dispatcher exists, and SiteRuntime-009 is now Resolved
(`ScriptExecutionActor`/`AlarmExecutionActor` run script bodies on the dedicated
`ScriptExecutionScheduler` threads, confirmed in `ScriptExecutionActor.cs:74`). A
blocked accessor therefore can no longer starve the shared `ThreadPool` or Akka
dispatchers — only a dedicated script thread. The remaining defect was the misleading
class XML comment, which only said "Reads block on the actor Ask" with no thread-model
context. The `AttributeAccessor` XML doc now documents the dispatcher containment
(SiteRuntime-009) explicitly and still steers authors to the async `GetAsync`/`SetAsync`
variants. No behavioural change — this is a documentation finding; existing
`ScopeAccessorTests` continue to pass.
### SiteRuntime-013 — `HandleUnsubscribeDebugView` does nothing despite documented behaviour
@@ -551,7 +562,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Actors/InstanceActor.cs:414` |
**Description**
@@ -573,7 +584,18 @@ comment to state explicitly that subscription teardown is handled by
**Resolution**
_Unresolved._
Resolved 2026-05-16 (`pending commit`): root cause confirmed — the Instance Actor
holds no per-subscriber state, so `HandleUnsubscribeDebugView` genuinely has nothing to
remove; the real debug-stream subscription lifecycle lives in `SiteStreamManager`
(Subscribe/Unsubscribe/RemoveSubscriber). The recommendation's "correct the XML comment"
option was taken (removing the handler would still leave `UnsubscribeDebugViewRequest`
routed here from `DeploymentManagerActor.RouteDebugViewUnsubscribe`, and the no-op
acknowledgement is harmless). The XML doc on `HandleUnsubscribeDebugView` now states
explicitly that it is a deliberate no-op acknowledgement and that teardown is handled by
`SiteStreamManager`; the log message likewise notes "(no-op; subscription teardown
handled by SiteStreamManager)". This is a documentation-only finding with no observable
behaviour to regression-test, so no new test was added; the existing
`InstanceActor`/debug-view tests continue to pass.
### SiteRuntime-014 — Trigger-expression evaluation blocks the coordinator actor thread
@@ -581,7 +603,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Status | Deferred |
| Location | `src/ScadaLink.SiteRuntime/Actors/ScriptActor.cs:219`, `src/ScadaLink.SiteRuntime/Actors/AlarmActor.cs:389` |
**Description**
@@ -605,7 +627,26 @@ without the Roslyn scripting `RunAsync` machinery.
**Resolution**
_Unresolved._
Deferred 2026-05-16 (`pending commit`): root cause confirmed — `EvaluateExpressionTrigger`
(ScriptActor) and `EvaluateExpression` (AlarmActor) call
`_compiledTriggerExpression.RunAsync(...).GetAwaiter().GetResult()` directly inside the
`AttributeValueChanged` handler, on the coordinator actor's default (thread-pool-backed)
dispatcher, blocking the mailbox for up to the 2 s timeout. Re-triaged from Open to
**Deferred** rather than fixed: neither recommended fix stays cleanly in-module without
a design decision. (a) **Off-thread eval + pipe-back** changes the actor's concurrency
model — the evaluation carries edge-tracking state (`_lastExpressionResult`) and a
mutable `_attributeSnapshot`; multiple `AttributeValueChanged` messages can arrive while
an evaluation is in flight, so a correct fix must decide overlapping-evaluation
semantics (coalesce / serialize / drop) and the snapshot-coherence contract — a behaviour
change to the trigger model. (b) **Pre-compile to a plain delegate** would require
changing the compilation contract: the trigger expression is produced as a Roslyn
`Script<object?>` by `ScriptCompilationService.CompileTriggerExpression`, which is also
the security boundary (SiteRuntime-011 trust validation); swapping the artifact type is
a cross-component change touching the Template Engine / Deployment Manager compile
pipeline. Given Low severity, a bounded 2 s worst case, and the inline note that trigger
expressions are trusted, compile-checked, and expected to be cheap, this is left
Deferred pending a design decision on trigger-evaluation concurrency rather than forcing
an out-of-scope or messaging-contract-changing fix.
### SiteRuntime-015 — `LoggerFactory` created per Instance Actor and never disposed
@@ -613,7 +654,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:746` |
**Description**
@@ -634,7 +675,18 @@ up per child. Do not create a fresh `LoggerFactory` in a hot creation path.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (`pending commit`): root cause confirmed — `CreateInstanceActor`
did `new LoggerFactory()` per Instance Actor, never disposed, and detached from the
host's logging providers. `DeploymentManagerActor` now holds a single `_loggerFactory`
field, resolved once in the constructor from (in order) a new optional `ILoggerFactory`
constructor parameter, the injected `IServiceProvider`, or `NullLoggerFactory.Instance`
as a last resort — never a per-instance allocation. `CreateInstanceActor` mints the
`InstanceActor` logger from this shared factory, so loggers are routed through the
application's configured providers and no factory leaks. Regression test:
`DeploymentManagerLoggerFactoryTests.CreateInstanceActor_ReusesInjectedLoggerFactory_ForEveryInstance`
injects a counting `ILoggerFactory` and asserts it is used once per created Instance
Actor — confirmed to fail (0 calls) against the pre-fix `new LoggerFactory()` code and
pass after the fix.
### SiteRuntime-016 — Short-lived execution actors, replication actor, and repositories are untested
@@ -642,7 +694,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Status | Resolved |
| Location | `tests/ScadaLink.SiteRuntime.Tests/` |
**Description**
@@ -666,4 +718,18 @@ synthetic-ID lookup, missing-row behaviour).
**Resolution**
_Unresolved._
Resolved 2026-05-16 (`pending commit`): re-triaged against the current test sources —
`SiteExternalSystemRepository` and `SiteNotificationRepository` are already covered by
`Repositories/SiteRepositoryTests.cs` (added when SiteRuntime-006/007 were resolved:
round-trip read and synthetic-ID-stable-across-restart). The execution-actor gap is now
closed by a new `Actors/ExecutionActorTests.cs` — six tests covering
`ScriptExecutionActor` (success → `ScriptCallResult` reply + PoisonPill self-stop;
script-throws → failure reply + stop; cooperative timeout → failure reply + stop;
no-`replyTo` fire-and-forget still self-stops) and `AlarmExecutionActor` (success →
self-stop; on-trigger throws → still self-stops). `SiteReplicationActor` is *not* covered
here: it depends on `Cluster.Get(Context.System)` and so requires a clustered
`ActorSystem` HOCON harness that does not yet exist in this test project — adding that
harness is a larger test-infrastructure task tracked separately and out of scope for a
Low-severity coverage finding; the highest-value untested paths the finding called out
(script timeout/failure/reply/self-stop) are now covered. Full module suite: 192 tests
green.